Performance

There are several characteristics that can be used to describe the quality and usefulness of a test. They are:

An important feature of some test characteristics is that some vary with disease prevalence while others just vary across different populations.

To evaluate the different types of test characteristic a simple 2 x 2 contingency table is used (shown below). True positives and true negatives are usually calculated by testing individuals with a known reference test termed the gold standard. It is important to realize that the gold standard is not necessarily the true representation of disease status but it is assumed that it accurately reflects the true disease status of an individual. It is therefore, essential to choose your reference standard with caution.

Disease status
  True positives True Negatives  
Test under evaluation Positive a b a + b
Negative c d c + d
  a + c b + d N

Prevalence

Prevalence is the measure of a condition in a population at a given point in time. It is important to take into consideration the prevalence of a disease in a population when choosing a diagnostic test because the test’s performance is affected by prevalence. For a more detailed explanation of how prevalence affects test performance see the Nature Review Microbiology link below.  The prevalence of a condition can be determined by calculating:

a + c
————
N

Top

Sensitivity

The sensitivity of a test is the probability that it will produce a true positive result when used on an infected population (as compared to a reference or "gold standard"). After inserting the test results into a table set up like Table 1, the sensitivity of a test can be determined by calculating:


a
————
a + c

Top

Specificity

The specificity of a test is the probability that a test will produce a true negative result when used on a non-infected population (as determined by a reference or "gold standard"). After inserting the test results into a table set up like Table 1, the specificity of a test can be determined by calculating:

d
————
b + d

Top

Accuracy

Overall accuracy is the probability that a person will correctly assigned to having or not having a particular disease by the test. One should be careful in using accuracy for measuring a tests validity due to it’s dependence on prevalence of disease within a population. Accuracy can be calculated by the following:

a +d
————
N

Or it can also be expressed as:

(Prevalence) (Sensitivity) + (1-Prevalence)(Specificity)

Top

Positive predictive value

The positive predictive value of a test is the probability that a person is infected when a positive test result is observed. In practice, predictive values should only be calculated from cohort studies or studies that legitimately reflect the number of people in that population who are infected with the disease of interest at that time. This is because predictive values are inherently dependent upon the prevalence of infection. After inserting results into a table set up like Table 1, the positive predictive value of a test can be determined by calculating:

a
————
a + b

Top

Negative predictive value

The negative predictive value of a test is the probability that a person is not infected when a negative test result is observed. This measure of accuracy should only be used if prevalence is available from the data. (See note in positive predictive value definition.) After inserting test results into a table set up like Table 1, the negative predictive value of a test can be determined by calculating:

d
————
c + d

Top

Positive diagnostic likelihood ratios

Diagnostic likelihood ratios (DLR) are not yet commonly reported in peer-reviewed literature or in marketing information provided by test manufacturers, but they can be a valuable tool for comparing the accuracy of several tests to the gold standard, and they are not dependent upon the prevalence of disease.
The positive DLR represents the odds ratio that a positive test result will be observed in an infected population compared to the odds that the same result will be observed among a noninfected population. After inserting test results into a table set up like Table 1, the positive DLR of a test can be determined by calculating:

a/ a+c
—————
(1-d/ c+d)

Or it can also be expressed as sensitivity:

sensitivity
—————
1-specificity

Useful tests will, therefore, have larger positive DLRs and less useful tests will have smaller positive DLRs. An example interpretation of a positive diagnostic likelihood ratio equal to 5.0 is for every 1% of non-infected subjects that test as positive, 5% of the infected subjects will test as positive.

Top

Negative diagnostic likelihood ratios

The negative DLR represents the odds ratio that a negative test result will be observed in an infected population compared to the odds that the same result will be observed among a non-infected population. After inserting the test results into a table set up like Table 1, the negative DLR for a test can be determined by calculating:

(1-a/ a +c)
—————
d/ c + d

Or

1-Sensivity
———————
Specificity

Useful tests will, therefore, have negative DLRs close to 0, and less useful tests will have higher negative DLRs. As an example, interpretation of a negative diagnostic likelihood ratio equal to 2.5 is for every one false negative, we observe 2.5 true negatives.

Top

Area under the receiver operator characteristic (ROC) curve

The ROC describes the relationship between sensitivity and specificity for diagnostic tests with a variable cut off point. The ROC is a plot of the true positive rate (Sensitivity) against the false positive rate (1-specificty) for different cut off values of the test. The performance of a diagnostic test can be judged by the position on the ROC line. The perfect test score for the area under the curve is 1.0 (curve hugs the left and top borders of the graph) where a score of 0.5 is undesirable (line in proximity to the rising diagonal of the graph). The likelihood ratio of the test for a particular value can be determined by the slope of the tangent line at that cutoff point.

Confidence intervals

Confidence intervals can be calculated to reflect the statistical significance of each measure. The smaller the interval the more precise a measurement is. Normally a 95% confidence interval is used. The calculation for a 95% confidence interval for sensitivity and specificity are described below as:

p +_1.96 x √p(1-p)/n

p= sensitivity or specificity (as a proportion not a percentage)
n=number of tests performed for infected people (sensitivity) or form uninfected people (specificity)

Top

Links to additional resources

These links provide information about measures of accuracy and the role of diagnostic tests from a general epidemiology perspective.

  • Diagnostic effectiveness
    This site is part of the Simple Interactive Statistical Analysis website. It includes an interactive table to calculate simple statistics and discusses indicators of diagnostic test effectiveness such as accuracy, sensitivity, specificity, positive likelihood, negative likelihood, diagnostic odds ratio, error odds ratio, prevalence, and predictive accuracy.
  • Designing Studies to Ensure that Estimates of Test Accuracy are Transferable
    Performance of a diagnostic test may vary from setting to setting. This paper explores the reasons for this variability and the implications for diagnostic test evaluation.
  • A guide for diagnostic evaluations
    A simple, user-friendly operational guide on how to design and conduct evaluations of diagnostic tests for infectious diseases that are of public health importance in the developing world.
  • Evaluation of diagnostic tests for infectious diseases:
    general principles

    A detailed review on general principles for evaluation of diagnostic tests for infectious diseases. See Box 1 for an explanation of how prevalence affects certain performance characteristics of diagnostic tests.
  • Evaluation of diagnostic tests for infectious diseases:
    general principles

    A document aimed at facilitating the setting of appropriate standards for test evaluation; to provide best-practice guidelines for assessing the performance and operational characteristics of diagnostic tests for infectious diseases in populations in which the tests are intended to be used; to help those designing evaluations at all levels, from test manufacturers to end-users; and to facilitate critical review of published and unpublished evaluations, with a view to selecting or approving tests that have been appropriately evaluated and shown to meet defined performance targets.

General epidemiology sites

  • Supercourse: Epidemiology, the Internet, and Global Health
    This is a general epidemiology web site put together at the University of Pittsburgh as a "Supercourse" for medical and health students around the world. The topic sites consist of Power Point presentations and include some disease-specific lectures.
  • British Medical Journal
    This is the British Medical Journal's "Epi for the Uninitiated" web site and is also a general epidemiology web site.