(Clinical Chemistry. 1998;44:108-115.)
© 1998 American Association for Clinical Chemistry, Inc.
|
Test Utilization and Outcomes |
Quantifying the bias associated with use of discrepant analysis
Harvey B. Lipmana,
and J. Rex Astles
a Address correspondence to this author at: Centers for Disease Control and Prevention, 4770 Buford Highway NE, Mailstop G25, Atlanta, GA 30341-3714. Fax (770) 488-7667; e-mail hxl0{at}cdc.gov.
 |
Abstract
|
|---|
Discrepant analysis is a widely used technique for estimating the
performance parameters of a laboratory test. In discrepant analysis,
each specimen is initially tested with the candidate test and a
comparison method, and when the results of the two tests disagree, a
confirmatory test is used to resolve the discrepancy. Discrepant
analysis usually produces biased estimates. This report quantifies this
bias and shows that it is usually positive, leading to overestimation
of the performance parameters of a laboratory test. The direction and
magnitude of the bias are predictably influenced by the analytical
sensitivity and specificity of the candidate test, comparison method,
and confirmatory test. The proportion of abnormal specimens tested also
affects the magnitude of the bias, particularly the estimates of
analytical sensitivity and positive predictive value when this
proportion is low. Alternative approaches are suggested.
 |
Introduction
|
|---|
Discrepant analysis
(DA),1
sometimes called discordant analysis, is a widely used
technique for estimating the analytical sensitivity, specificity,
positive predictive value (PPV), and negative predictive value (NPV) of
a candidate laboratory test in the absence of a comparison method with
100% analytical sensitivity and specificity (a perfect comparison
method). In the simplest form of DA, each specimen is initially tested
with the candidate test and a comparison method. If both test results
agree, then the specimen is classified as normal or abnormal
accordingly. A specimen for which results from the two tests differ (a
discrepant result) is then retested with a third (confirmatory) test,
and the specimen is classified as normal or abnormal based on the
result of the confirmatory test. Usually, three tests are done on
specimens with discrepant results: a candidate test, comparison method,
and confirmatory test. Specimens with discrepant results may, however,
also be retested with more than one confirmatory test. Sequential
testing may continue with two or more confirmatory tests until
discrepancies are resolved, by agreement, with only one agreement
needing to occur between the candidate test result and any confirmatory
test to declare resolution of the discrepancy (1)(2)(3)(4). In
cases where sequential testing is done, no additional tests are run
after concordance is reached.
A specimen subjected to DA can be classified correctly in two ways and
classified incorrectly in two ways. A specimen can be correctly
classified if both the candidate test and the comparison method
correctly classify the specimen, or there is initial disagreement
between the candidate test and comparison method that is correctly
resolved by the confirmatory test. Similarly, a specimen can be
misclassified if both the candidate test and the comparison method
misclassify the specimen, or there is initial disagreement between the
candidate test and comparison method that is incorrectly resolved by
the confirmatory test. In all cases, misclassification occurs whenever
two testing errors occur.
Since the mid-1980s, DA has increasingly been used to estimate test
parameters, particularly in infectious disease testing for organisms
such as Chlamydia trachomatis, Helicobacter pylori, and
Mycobacteria tuberculosis. Increased use of DA has
paralleled the evolution of molecular diagnostic techniques, which are
often used as confirmatory adjuncts to the traditionally used, but less
sensitive, comparison method testing by organism culture.
The fundamental flaw is that DA uses circular reasoning. The
purpose of DA is to estimate the ability of a test to classify a
specimen as normal or abnormal, but DA also uses the same test results
to actually classify the specimen as normal or abnormal. This circular
reasoning leads to misclassification bias and, hence, yields biased
estimates (5).
A simple example illustrates the potential magnitude of the bias in
estimates computed with use of DA. Suppose that the candidate test and
the comparison method are independent fair coin flips, with the result
being called abnormal if heads occurs and normal if tails occurs, and
let the confirmatory test be an independent perfect test. Assume that
one-half of the specimens are actually abnormal. The results of this
simple example and their associated probabilities are listed in Table 1
. Obviously, the true analytical sensitivity of this coin flip
test is 50%, but the expected DA estimate of the sensitivity is:
Thus, the expected DA estimate for sensitivity is 75%, well above
the true value of 50%.
View this table:
[in this window]
[in a new window]
|
Table 1. All possible combinations of true status, test results,
classification when DA is used, and their probabilities of occurrence,
when the candidate and comparison tests are independent fair-coin
flips, with discrepancies resolved using a perfect
test.
|
|
A hypothetical example serves as a useful introduction to DA. Consider
100 abnormal and 200 normal specimens tested with a candidate test and
a comparison method. Assume that the candidate test has 80% analytical
sensitivity and 70% analytical specificity, whereas the comparison
method has 95% analytical sensitivity and 80% analytical specificity.
The true status of the specimens and expected test results are shown in
Table 2
, but because the true status of the samples would not be known,
the 2 x 2 table of expected test results would distribute as
shown in Table 3
.
View this table:
[in this window]
[in a new window]
|
Table 2. The true status of the specimens and expected test results
when 100 abnormal and 200 normal specimens are tested with a candidate
test having 80% analytical sensitivity and 70% analytical specificity
and a comparison method having 95% analytical sensitivity and 80%
analytical specificity.
|
|
View this table:
[in this window]
[in a new window]
|
Table 3. Expected test results when 100 abnormal and 200 normal
specimens are tested with a candidate test having 80% analytical
sensitivity and 70% analytical specificity and a comparison test
having 95% analytical sensitivity and 80% analytical
specificity.
|
|
If the discrepancies in results for the 300 samples are resolved by
using an independent perfect test, then the expected DA estimates for
analytical sensitivity and specificity are, respectively:
 |
and
 |
Also, the true PPV of the candidate test is 57.1%, and the true
NPV is 87.5%, but DA yields an expected PPV estimate of 65.7% and an
expected NPV estimate of 88.1%. Thus, all of the estimates are
positively biased.
The true sensitivity, specificity, PPV, and NPV of a laboratory test
are probabilities and thus are parameters. These parametric values
should not be confused with the estimates for these parameters, which
are statistical values. When DA is used to estimate sensitivity, for
example, a statistic is computed. This statistic is not an unbiased
estimate for the true sensitivity. Nonetheless, it is still an unbiased
estimate for its expected value. The difference between the expected
value of the DA estimate and the true sensitivity is the expected bias
of the DA estimate. Similarly, the difference between the expected
value of a DA estimate for any performance parameter and the true value
of that parameter is the expected bias of the DA estimate for that
parameter. Like all parametric values, this expected bias does not vary
from estimate to estimate.
In this report, we undertake to examine more fully the direction and
magnitude of the expected biases in the estimates of analytical
sensitivity, specificity, PPV, and NPV computed with DA by
investigating how these biases are affected by the true sensitivities
and specificities of all of the tests used in the DA process, along
with the true proportion of abnormal specimens tested.
 |
Materials and Methods
|
|---|
Let R = the proportion of abnormal specimens among
the specimens tested (the prevalence rate)
X = candidate test results
Y = comparison method results
Z = confirmatory test results
Sei = analytical sensitivity of test i
(i = X, Y, Z)
Spi = analytical specificity of test i
(i = X, Y, Z)
Table 4
lists all possible combinations of true specimen status,
candidate test result, comparison method result, confirmatory test
result, estimated specimen status, and the probability of occurrence
for each combination when DA is used and all test results are
independent. The outcomes that lead to misclassification of a specimen
have been emphasized.
View this table:
[in this window]
[in a new window]
|
Table 4. All possible combinations of true status, test results,
classification when DA is used, and their probabilities of occurrence,
assuming all test results are independent.
|
|
The expected values of the DA estimates for the performance parameters
are given below:
Analytical sensitivity:
Analytical specificity:
PPV:
NPV:
Table 5
lists the test result frequencies. Notice that DA divides cell
frequencies b and c in the traditional 2 x 2 contingency table of
results (corresponding to disagreement between the candidate test and
comparison method) into component parts that depend on the results of
confirmatory testing. When discrepancies are resolved, the b1 component
is shifted into the true positive (a) cell and c2 is shifted into the
true negative (d) cell, leading to the estimates given below:
 |
Results
|
|---|
The magnitude and direction of the bias that DA imparts on the
estimates for analytical sensitivity and specificity can best be shown
graphically by fixing the prevalence rate and the performance
parameters of the comparison method and confirmatory test while
allowing the true analytical sensitivity and analytical specificity of
the candidate test to vary.
Figure 1
, left, displays the biases of these estimates when the
prevalence rate is 25%; the comparison method has analytical
sensitivity and specificity of 90%, and the confirmatory test has
analytical sensitivity and specificity of 95%. Thus, if DA were used
in this situation with a candidate test that actually has 70%
analytical sensitivity and 80% analytical specificity, Fig. 1
, left,
shows that the expected biases are ~5% for analytical sensitivity
and 2% for analytical specificity, yielding expected estimates of
~75% for analytical sensitivity and 82% for analytical specificity.
For this scenario, the percent bias in the DA estimates for PPV and NPV
is given by Eq. 1
:
 | (1) |
Thus, the candidate test described here that has a true PPV of
54% and a true NPV of 89% would have DA estimates with an expected
bias of ~6% for PPV and 1% for NPV, yielding expected estimates of
~60% for PPV and 90% for NPV.

View larger version (21K):
[in this window]
[in a new window]
|
Figure 1. Biases of the DA estimates of analytical sensitivity and
specificity.
Left panel: prevalence rate of 25%, comparison method
analytical sensitivity and specificity of 90%, and confirmatory test
analytical sensitivity and specificity of 95%. Right panel:
prevalence rate of 10%, comparison method analytical sensitivity and
specificity of 90%, and confirmatory test analytical sensitivity and
specificity of 95%. Solid lines indicate sensitivity bias;
dashed lines indicate specificity bias.
|
|
As Fig. 1
, left, and Eq. 1
suggest, DA can be expected to yield
inflated estimates of the performance parameters for most true values
of the analytical sensitivity and specificity. For this example, the
analytical sensitivity of the candidate test must exceed ~90%, and
the analytical specificity must exceed ~98% before DA tends to yield
analytical sensitivity and specificity estimates that are negatively
biased. Also, as Eq. 1
shows, for the given situation, DA will tend to
overestimate the PPV and NPV whenever their true values are less than
~97%.
Figure 1
, right, displays the biases in the DA estimates of analytical
sensitivity and specificity when the sensitivities and specificities of
the comparison method and confirmatory test are the same as in Fig. 1
, left, but the prevalence rate decreases from 25% to 10%. The biases
in the DA estimates of PPV and NPV are unaffected by the prevalence
rate. As Fig. 1
demonstrates, even in those limited situations where DA
can be expected to produce negatively biased estimates, the amount of
the underestimation is usually small. The maximum negative bias occurs
when the candidate test is a perfect test, a situation in which it is
highly unlikely that DA would be applied because the estimated
candidate test parameters are acceptably high.
Table 6
shows the effect the prevalence rate has on the DA estimates of
the performance parameters. Notice how variable the DA estimate of
analytical sensitivity is when the prevalence rate is low. When the
prevalence rate is 1% and the candidate test is a perfect test (100%
sensitive and specific), the DA estimate of analytical sensitivity is
only 66.8%, whereas, paradoxically, when the candidate test is only
70% sensitive and specific, the expected DA estimate is 89.2%. When
the prevalence rate is low and the candidate test is not highly
sensitive and specific, the true PPV of the candidate test is quite
low, but the DA estimate tends to overestimate the true value by a
large amount. For example, when the prevalence rate is 1% and the true
analytical sensitivity and specificity of the candidate test are 90%,
then the true PPV is ~8.3%, but the expected DA estimate is 21.6%,
over 2.5 times the true value. As the true analytical sensitivity and
specificity decrease, the relative bias of the DA PPV estimate
increases.
View this table:
[in this window]
[in a new window]
|
Table 6. Expected DA estimates for analytical sensitivity,
specificity, PPV, and NPV for various true sensitivities,
specificities, and prevalence rates when the comparison test is 90%
sensitive and specific and the confirmatory test is 95% sensitive and
specific.
|
|
Using DA could actually result in an inferior candidate test
supplanting a better test. Consider the following scenario: test T is
new and is being proposed as a screening test for a disease with a
prevalence rate of 1%. The test currently used for this purpose (test
U) has 90% analytical sensitivity and 97% analytical specificity,
yielding a PPV of 23.3%. The most specific test available (test V) is
known to have an analytical specificity of 99% but is only 80%
sensitive (PPV = 44.7%) and either too costly or time consuming
to use as a screening test. Suppose that the analytical sensitivity of
test T is greater than that of test V but less than that of test U,
say, 85%. Then test T would have to have an analytical specificity
>97.2% to have a higher PPV than test U. Now suppose that a sample of
specimens from the population is tested with both test T and test U,
with all discrepancies resolved by using test V. Then the analytical
specificity of test T need only exceed 96.7% to yield an expected
estimated PPV greater than that of test U. In reality, a test with 85%
analytical sensitivity and 96.7% analytical specificity has a true PPV
of only 20.6%. Thus, a candidate test that is less sensitive and
specific, and hence has a lower PPV, than the present screening test
could replace the present test because its DA-estimated PPV appears to
be greater than the PPV of the currently used test.
One argument often made for using DA in the absence of a perfect test
is the situation where two complementary comparison methods are
available, one with high analytical sensitivity and one with high
analytical specificity. The candidate test is first tested against one
of the comparison methods and then DA is applied, with the other
comparison method used as a confirmatory test. For example, suppose
that the candidate test is first tested against a comparison method
with 100% analytical specificity but only 80% analytical sensitivity.
All discrepancies are then resolved by using a second independent
confirmatory test with 100% analytical sensitivity and 80% analytical
specificity. If 25% of the samples are actually abnormal, then Fig. 2
shows the expected biases of the DA estimates of analytical
sensitivity and specificity. For this scenario, the percentage of bias
in the DA estimates for PPV and NPV is given by Eq. 2
.
 | (2) |
The estimates are not affected by the order of the testing.

View larger version (27K):
[in this window]
[in a new window]
|
Figure 2. Biases of the DA estimates of analytical sensitivity and
specificity for a prevalence rate of 25%, comparison method analytical
sensitivity of 80% and analytical specificity of 100%, and
confirmatory test analytical sensitivity of 100% and analytical
specificity of 80%.
Solid lines indicate sensitivity bias; dashed
lines indicate specificity bias.
|
|
Therefore, in this situation, if a candidate test has both analytical
sensitivity and specificity of 78%, and hence a PPV of 54% and an NPV
of 91%, then the expected DA estimates are ~84% for analytical
sensitivity, 82% for analytical specificity, 63% for PPV, and 93%
for NPV. Not only are all of the estimates positively biased, but a
test that, in reality, has lower analytical sensitivity and specificity
than both the comparison method and confirmatory test would probably
appear to be more sensitive than the comparison method and more
specific than the confirmatory test. It can be shown that when one of
the comparison methods has 100% analytical sensitivity and the other
has 100% analytical specificity, DA can always be expected to
overestimate the performance parameters, unless the true analytical
sensitivity or specificity of the candidate test is also 100%. Fig. 2
and Eq. 2
illustrate this.
Often, DA is performed with only one type of discordance being
resolved. For example, when culture is used as the comparison method,
the culture-positive discrepancies are usually not resolved because
culture is assumed to have 100% analytical specificity
(1)(4)(6)(7)(8). In such a situation,
only the culture-negative discordancies are resolved. Fig. 3
shows the biases in the analytical sensitivity and specificity
estimates when the prevalence rate is 25%, the comparison method has
80% analytical sensitivity and 99% analytical specificity, and a
confirmatory test that has 95% analytical sensitivity and specificity
is used to resolve the comparison method negative discordancies. When
only one type of discordance is resolved, the biases are no longer the
same for PPV and NPV. For this scenario, the percentage of biases in
the DA estimates for PPV and NPV are given by Eqs. 3
, and 4
.
 | (3) |
 | (4) |
For this example, a candidate test that actually has 85%
analytical sensitivity and 90% analytical specificity, and thus has
PPV of 73.9% and NPV of 94.7%, can be expected to have DA estimates
of 85.4% for analytical sensitivity, 90.3% for analytical
specificity, 74.6% for PPV, and 94.8% for NPV, all positively biased.

View larger version (27K):
[in this window]
[in a new window]
|
Figure 3. The biases of the DA estimates of analytical sensitivity
and specificity for a prevalence rate of 25%, comparison method
analytical sensitivity of 80% and analytical specificity of 99%, and
confirmatory test analytical sensitivity and analytical specificity of
95%, when only the negative discordancies of the comparison method are
resolved.
Solid lines indicate sensitivity bias; dashed
lines indicate specificity bias.
|
|
 |
Discussion
|
|---|
In the absence of a perfect comparison method, the true status of
a specimen can never be known, leading to potential misclassification
bias (9). In the case of DA, the same test results are
being used to estimate both a specimen's true status and the ability
of the test to identify that status. In addition, the status of some
specimens is estimated by using the results of only two tests, whereas
for other samples, more than two tests are used. The systematic and
subjective nature of this testing procedure leads to differential
misclassification bias, which modeling suggests almost always results
in overestimation of the performance parameters of the candidate test.
Although it is true that DA often yields estimates for the performance
parameters that are more accurate than just comparing the candidate
test to one imperfect comparison method, this does not justify using
the candidate test, the performance capabilities of which are in doubt,
to help classify the specimens. It would be preferable to estimate the
true status of a specimen by using all of the independent comparison
methods available. When only two such comparison methods are to be
used, as in basic DA, it would be better to test all of the specimens
with both comparison method tests, classify only the specimens with
concordant results, and discard the specimens with discordant results.
Doing so would greatly reduce the misclassification bias. For the
example displayed (Fig. 1
, left), when the analytical sensitivity and
specificity of the candidate test each range between 50% and 100%,
the bias in the DA analytical sensitivity estimate ranges between
-0.1% and 5.0%. When only the comparison methods are used to
classify the specimens, with discordancies discarded, the bias ranges
from -0.2% to 0.0%, which is appreciably less. Over the same range
of analytical sensitivities and specificities, the candidate test true
PPV ranges from 25% to 100%. The bias in the DA PPV estimate ranges
from -0.5% to 10.8%. In contrast, the bias in the PPV estimate
computed with the method that uses only the concordant comparison
method results to classify the specimens ranges from -0.6% to 0.3%,
which is again appreciably less.
An example showing the expected biases and errors in the estimates of
the performance parameters when each of the three estimation procedures
is used is revealing. Suppose that one had available two independent
comparison methods, tests A and B, along with 1000 abnormal and 3000
normal specimens to estimate the performance parameters of a candidate
test that actually has 80% analytical sensitivity and 80% analytical
specificity. Let test A have 90% analytical sensitivity and
specificity, and let test B have 95% analytical sensitivity and
specificity. Table 7
displays the expected estimates and 95% confidence intervals
for each of the following three methods of classifying the specimens:
method I, test B only; method II, DA; method III, tests A and B with
discordancies discarded. Only two of the four 95% confidence intervals
from method I and one of the four intervals from method II cover the
true values, whereas all four of the intervals from method III cover
the true values. Overall, the DA estimates are slightly less biased
than using only test B to classify the specimens, but using method III,
which avoids using the candidate test to classify the specimens, is
vastly superior to DA.
View this table:
[in this window]
[in a new window]
|
Table 7. Expected estimates (and 95% confidence intervals) for the
performance parameters of a test with true analytical sensitivity and
specificity of 80%.
|
|
Up to this point, the results and discussions have assumed that all
test results are independent. The impact of DA is worsened by
dependence between any of the tests used in the DA process. Dependent
tests here refer to those measuring disease markers that are either
analytically or physiologically similar, such that their test results
tend to agree; i.e., they tend to classify and misclassify in tandem.
For purposes of myocardial infarction diagnosis, total creatine kinase
(CK) and CK-MB fraction are dependent. On the other hand,
electrocardiographic changes would be independent of both. Suppose that
the discrepancies in Table 3
are resolved by using a test that is
similar to the candidate test. In particular, suppose that the
confirmatory test yields positive results in tandem with the candidate
test with probability p (i.e., when the candidate test
result is positive, the confirmatory test result is also positive with
probability p).
It follows that the expected DA estimate for the analytical sensitivity
of the candidate test is:
The more the confirmatory test mirrors the candidate test, the
greater the value of p and the closer the DA estimate for
analytical sensitivity is to 100%, regardless of the true analytical
sensitivity of the candidate test.
The DA estimates remain biased even when a comparison method with 100%
analytical sensitivity (or specificity) is complemented by a
confirmatory test with 100% analytical specificity (or sensitivity).
Even using a perfect comparison method to resolve discordancies does
not eliminate this bias. Indeed, resolving discordancies with a perfect
comparison method yields expected DA estimates for imperfect tests that
are always too high.
When the prevalence rate is low, the bias associated with the DA
estimate of PPV can be quite large. This overestimation of PPV can be
especially problematic if one is considering using the candidate test
as a screening test for a relatively rare disease, where even a small
bias would result in a large underestimation of the expected
false-positive rate.
Staquet et al. (10) showed that when unbiased estimates
for the analytical sensitivity and specificity of the comparison method
exist, then unbiased estimates for the analytical sensitivity and
specificity of a candidate test can be easily computed. Thus, when one
has unbiased estimates for the analytical sensitivity and specificity
of a comparison method, it is difficult to justify the use of DA
because one need not incur the increased costs of subjecting specimens
with discrepant results to further testing. Other methods for
estimating the performance parameters of a candidate test have been
derived when unbiased estimates of comparison method test analytical
sensitivity and specificity are not available
(11)(12). When technological advances result
in the gradual replacement of an old "gold standard," such as
culture, with a new standard such as PCR, it is tempting to resolve
discordancies by using the new technology. A fairer approach would be
to report two estimates of analytical performance by using old and new
technologies independently, or to treat the old and new standards as
comparison methods and use method III described earlier, which discards
samples whose comparison method results do not agree.
We applaud all efforts to investigate discrepancies in laboratory
testing; this is nothing but good science. Using the process for
estimating and reporting test performance that has come to be called
DA, however, should be relied upon only as a last resort; we suggest
that users scrutinize estimates derived by DA very carefully. As shown
here, DA consistently overestimates laboratory performance parameters.
Certainly, rare occasions may arise when the use of DA might be
justified, as when comparison method testing is initially performed in
Third World conditions and only a few samples can be transported for
confirmation. Because alternative methods exist, however, the magnitude
and direction of the bias associated with DA appear to negate its use
as a valid method for estimating the performance parameters of a
laboratory test, a conclusion that makes its use difficult to justify
for reasons such as cost reduction or convenience.
 |
Footnotes
|
|---|
Division of Laboratory Systems, Public Health Practice Program Office, Centers for Disease Control and Prevention, Atlanta, GA 30341-3714.
1 Nonstandard abbreviations: DA, discrepant analysis; PPV, positive predictive value; and NPV, negative predictive value. 
 |
References
|
|---|
-
Quinn TC, Welsh L, Lentz A, Crotchfelt K, Zenilman J, Newhall J, et al. Diagnosis by AMPLICOR PCR of Chlamydia trachomatis infection in urine samples from women and men attending sexually transmitted disease clinics. J Clin Microbiol 1996;34:1401-1406.
[Abstract]
-
Lee HH, Chernesky MA, Schachter J, Burczak JD, Andrews WW, Muldoon S, et al. Diagnosis of Chlamydia trachomatis genitourinary infection in women by ligase chain reaction assay of urine. Lancet 1995;345:213-216.
[Web of Science][Medline]
[Order article via Infotrieve]
-
Smith KR, Ching S, Lee H, Ohhashi Y, Hu HY, Fisher HC, III, et al. Evaluation of ligase chain reaction for use with urine for identification of Neisseria gonorrh

in females attending a sexually transmitted disease clinic. J Clin Microbiol 1995;33:455-457.
[Abstract]
-
Schachter J, Stamm WE, Quinn TC, Andrews WW, Burczak JD, Lee HH. Ligase chain reaction to detect Chlamydia trachomatis infection of the cervix. J Clin Microbiol 1994;32:2540-2543.
[Abstract/Free Full Text]
-
Hadgu A. The discrepancy in discrepant analysis. Lancet 1996;348:592-593.
[Web of Science][Medline]
[Order article via Infotrieve]
-
Goessens WH, Kluytmans JA, den Toom N, van Rijsoort-Vos TH, Niesters BG, Stolz E, et al. Influence of volume of sample processed on detection of Chlamydia trachomatis in urogenital samples by PCR. J Clin Microbiol 1995;33:251-253.
[Abstract]
-
Hoffner SE, Cristea M, Klintz L, Petrini B, Kallenius G. RNA amplification for direct detection of Mycobacterium tuberculosis in respiratory samples. Scand J Infect Dis 1996;28:59-61.
[Web of Science][Medline]
[Order article via Infotrieve]
-
Biro FM, Reising SF, Doughman JA, Kollar AM, Rosenthal SL. A comparison of diagnostic methods in adolescent girls with and without symptoms of Chlamydia urogenital infection. Pediatrics 1994;93:476-480.
[Abstract/Free Full Text]
-
Valenstein PN. Evaluating diagnostic tests with imperfect standards. Am J Clin Pathol 1990;93:252-258.
[Web of Science][Medline]
[Order article via Infotrieve]
-
Staquet M, Rozencweig M, Lee YJ, Muggia FM. Methodology for assessment of new dichotomous diagnostic tests. J Chronic Dis 1981;34:599-610.
[Web of Science][Medline]
[Order article via Infotrieve]
-
Hui SL, Walter SD. Estimating the error rates of diagnostic tests. Biometrics 1980;36:167-171.
[Web of Science][Medline]
[Order article via Infotrieve]
-
Smith PJ, Hadgu A. Sensitivity and specificity for correlated observations. Stat Med 1992;11:1503-1509.
[Web of Science][Medline]
[Order article via Infotrieve]