Performance
There are several characteristics that can be used to describe the quality and usefulness of a test. They are:
- Prevalence
- Sensitivity & Specificity
- Accuracy
- Positive Predictive Value (PPV) & Negative Predictive Value (NPV)
- Positive Likelihood Ratio & Negative Likelihood Ratio
- Area under receiver operator characteristic curve (ROC)
An important feature of some test characteristics is that some vary with disease prevalence while others just vary across different populations.
To evaluate the different types of test characteristic a simple 2 x 2 contingency table is used (shown below). True positives and true negatives are usually calculated by testing individuals with a known reference test termed the gold standard. It is important to realize that the gold standard is not necessarily the true representation of disease status but it is assumed that it accurately reflects the true disease status of an individual. It is therefore, essential to choose your reference standard with caution.
| Disease status | ||||
|---|---|---|---|---|
| True positives | True Negatives | |||
| Test under evaluation | Positive | a | b | a + b |
| Negative | c | d | c + d | |
| a + c | b + d | N | ||
Prevalence
Prevalence is the measure of a condition in a population at a given point in time. It is important to take into consideration the prevalence of a disease in a population when choosing a diagnostic test because the test’s performance is affected by prevalence. For a more detailed explanation of how prevalence affects test performance see the Nature Review Microbiology link below. The prevalence of a condition can be determined by calculating:
a + c
————
N
Sensitivity
The sensitivity of a test is the probability that it will produce a true positive result when used on an infected population (as compared to a reference or "gold standard"). After inserting the test results into a table set up like Table 1, the sensitivity of a test can be determined by calculating:
a
————
a + c
Specificity
The specificity of a test is the probability that a test will produce a true negative result when used on a non-infected population (as determined by a reference or "gold standard"). After inserting the test results into a table set up like Table 1, the specificity of a test can be determined by calculating:
d
————
b + d
Accuracy
Overall accuracy is the probability that a person will correctly assigned to having or not having a particular disease by the test. One should be careful in using accuracy for measuring a tests validity due to it’s dependence on prevalence of disease within a population. Accuracy can be calculated by the following:
a +d
————
N
Or it can also be expressed as:
(Prevalence) (Sensitivity) + (1-Prevalence)(Specificity)
Positive predictive value
The positive predictive value of a test is the probability that a person is infected when a positive test result is observed. In practice, predictive values should only be calculated from cohort studies or studies that legitimately reflect the number of people in that population who are infected with the disease of interest at that time. This is because predictive values are inherently dependent upon the prevalence of infection. After inserting results into a table set up like Table 1, the positive predictive value of a test can be determined by calculating:
a
————
a + b
Negative predictive value
The negative predictive value of a test is the probability that a person is not infected when a negative test result is observed. This measure of accuracy should only be used if prevalence is available from the data. (See note in positive predictive value definition.) After inserting test results into a table set up like Table 1, the negative predictive value of a test can be determined by calculating:
d
————
c + d
Positive diagnostic likelihood ratios
Diagnostic likelihood ratios (DLR) are not yet commonly reported in peer-reviewed literature or in marketing information provided by test manufacturers, but they can be a valuable tool for comparing the accuracy of several tests to the gold standard, and they are not dependent upon the prevalence of disease.
The positive DLR represents the odds ratio that a positive test result will be observed in an infected population compared to the odds that the same result will be observed among a noninfected population. After inserting test results into a table set up like Table 1, the positive DLR of a test can be determined by calculating:
a/ a+c
—————
(1-d/ c+d)
Or it can also be expressed as sensitivity:
sensitivity
—————
1-specificity
Useful tests will, therefore, have larger positive DLRs and less useful tests will have smaller positive DLRs. An example interpretation of a positive diagnostic likelihood ratio equal to 5.0 is for every 1% of non-infected subjects that test as positive, 5% of the infected subjects will test as positive.
Negative diagnostic likelihood ratios
The negative DLR represents the odds ratio that a negative test result will be observed in an infected population compared to the odds that the same result will be observed among a non-infected population. After inserting the test results into a table set up like Table 1, the negative DLR for a test can be determined by calculating:
(1-a/ a +c)
—————
d/ c + d
Or
1-Sensivity
———————
Specificity
Useful tests will, therefore, have negative DLRs close to 0, and less useful tests will have higher negative DLRs. As an example, interpretation of a negative diagnostic likelihood ratio equal to 2.5 is for every one false negative, we observe 2.5 true negatives.
Area under the receiver operator characteristic (ROC) curve
The ROC describes the relationship between sensitivity and specificity for diagnostic tests with a variable cut off point. The ROC is a plot of the true positive rate (Sensitivity) against the false positive rate (1-specificty) for different cut off values of the test. The performance of a diagnostic test can be judged by the position on the ROC line. The perfect test score for the area under the curve is 1.0 (curve hugs the left and top borders of the graph) where a score of 0.5 is undesirable (line in proximity to the rising diagonal of the graph). The likelihood ratio of the test for a particular value can be determined by the slope of the tangent line at that cutoff point.
Confidence intervals
Confidence intervals can be calculated to reflect the statistical significance of each measure. The smaller the interval the more precise a measurement is. Normally a 95% confidence interval is used. The calculation for a 95% confidence interval for sensitivity and specificity are described below as:
p +_1.96 x √p(1-p)/n
p= sensitivity or specificity (as a proportion not a percentage)
n=number of tests performed for infected people (sensitivity) or form uninfected people (specificity)
Links to additional resources
These links provide information about measures of accuracy and the role of diagnostic tests from a general epidemiology perspective.
- Diagnostic effectiveness
This site is part of the Simple Interactive Statistical Analysis website. It includes an interactive table to calculate simple statistics and discusses indicators of diagnostic test effectiveness such as accuracy, sensitivity, specificity, positive likelihood, negative likelihood, diagnostic odds ratio, error odds ratio, prevalence, and predictive accuracy. - Designing Studies to Ensure that Estimates of Test Accuracy are Transferable
Performance of a diagnostic test may vary from setting to setting. This paper explores the reasons for this variability and the implications for diagnostic test evaluation. - A guide for diagnostic evaluations
A simple, user-friendly operational guide on how to design and conduct evaluations of diagnostic tests for infectious diseases that are of public health importance in the developing world. - Evaluation of diagnostic tests for infectious diseases:
general principles
A detailed review on general principles for evaluation of diagnostic tests for infectious diseases. See Box 1 for an explanation of how prevalence affects certain performance characteristics of diagnostic tests. - Evaluation of diagnostic tests for infectious diseases:
general principles
A document aimed at facilitating the setting of appropriate standards for test evaluation; to provide best-practice guidelines for assessing the performance and operational characteristics of diagnostic tests for infectious diseases in populations in which the tests are intended to be used; to help those designing evaluations at all levels, from test manufacturers to end-users; and to facilitate critical review of published and unpublished evaluations, with a view to selecting or approving tests that have been appropriately evaluated and shown to meet defined performance targets.
- Likelihood ratios: getting diagnostic testing into perspective
This article reviews the performance of diagnostic tests by their likelihood ratio, and compares them to the power of clinical assessment. - Communicating Accuracy of Tests to General Practitioners: a Controlled Study
This paper discusses common mistakes made by clinicians in the use of diagnostic test statistics. The importance of including diagnostic test sensitivity and specificity as well as positive likelihood ratio in simple language is reinforced. - Standards for Reporting of Diagnostic Accuracy (STARD)
With the mission to improve the reporting of diagnostic accuracy and to educate readers about the potential for bias in diagnostic evaluation studies, these two articles provide a mission statement, checklist, and flowchart to improve the reporting of diagnostic accuracy studies.
General epidemiology sites
- Supercourse: Epidemiology, the Internet, and Global Health
This is a general epidemiology web site put together at the University of Pittsburgh as a "Supercourse" for medical and health students around the world. The topic sites consist of Power Point presentations and include some disease-specific lectures. - British Medical Journal
This is the British Medical Journal's "Epi for the Uninitiated" web site and is also a general epidemiology web site.