|
|
||||||||
Laboratory Management |
a Author for correspondence. Fax 32-9-264 81 98; e-mail Linda.Thienpont{at}rug.ac.be.
| Abstract |
|---|
|
|
|---|
sa,tot);
(d) the samples are adequately distributed over the
investigated range; and (e) the number of samples used for
the comparison is adequate. | Introduction |
|---|
|
|
|---|
Note that, regardless of the model used, the uncertainty in the regression estimates increases when (a) the magnitude of Sy||x relative to the data range is high; (b) the data are not adequately distributed over the investigated range; (c) the number of data points is low; or (d) the relation between the data is not linear. In the last case, the model itself is wrong and should be replaced by a nonlinear one. (Nonlinear procedures are sometimes referred to as second-order linear regression.) For the above reasons, it is recommended that method comparison data always be accompanied by graphical presentations (8)(9).
Because of the availability of different linear regression procedures,
the question of which is the most appropriate for method comparison
studies has been investigated by several groups, e.g., by Cornbleet and
Gochman (10), Wakkers et al. (11), Linnet
(12)(13), Payne (14), and Lawton et
al. (15). From these investigations, which focused mainly on
the error in slope estimation and the consequent statistical
unreliability of hypothesis testing, many prefer DR over OLR. Others
make use of OLR dependent on the value calculated for the
productmoment correlation coefficient (r). For example,
Wakkers et al. (11), Westgard and Hunt (1), and
the EP9 protocol from the NCCLS (8) restrict the use of OLR
to those cases where r
0.99 (11) or
0.975
(1)(8). With respect to PBR, its relevance still
deserves better demonstration because some put its use as an
alternative to OLR into perspective by revealing inadequacies in
simulation tests (12), whereas others advocate it over DR
(14). In the most extreme, it is argued that linear
regression should not be applied at all in connection with method
comparison studies (16), but that graphical techniques (bias
plots) (9)(17)(18) be used instead.
To elucidate the relevance of these recommendations in practice, we here compare the usefulness of different linear regression variants (OLR, DR, SPCA, and PBR) on the basis of data obtained from real-life studies (19)(20). Our main emphasis was to clarify whether the quality of the analytical input data or the particular regression method used has the greater influence on the validity of the regression data. We discuss, in particular, the importance of correlation analysis in connection with regression analysis and the relevance of Sy||x.
The discussion of whether bias plots or linear regression procedures are more appropriate for interpretation of method comparison studies is beyond the scope of this study. The reader is referred to reports by Stöckl (9), Lawton et al. (16), Bland and Altman (17), and Hyltoft Peterson et al. (18) for this subject. Note also that the data of the method comparison studies here serve only the purpose of evaluating the different regression procedures. The analytical and clinical relevance of the method comparison studies have been described elsewhere (19)(20).
| Materials and Methods |
|---|
|
|
|---|
For the different regression procedures, OLR was performed with Microsoft EXCEL® or the EVAL KIT (21) software. DR was performed with Microsoft EXCEL [using the respective formulae used by Cornbleet and Gochman (10) and Linnet (12)] or the EP Evaluator (22) software. SPCA and PBR were performed with the EVAL KIT software. Note that the latter calculates the SPCA slope as the geometric mean of the two slopes that result from OLR, performed with y and with x as the independent variable (4). Robust OLR was performed with SYSTAT 7.0® software (23). From the several variants available, the trimming procedure was chosen, with a factor of 0.1 for discriminating outlying residuals.
| Results and Discussion |
|---|
|
|
|---|
|
|
the Sy||x PROBLEMATIC: CONNECTION
BETWEEN Sy||x ANDr
Like the correlation coefficient r,
Sy||x indicates the magnitude of total random error
of the method comparison, including nonlinearity, drift or shift, total
analytical imprecision (sa,tot), and
sample-related effects. However, in our experience when several methods
are compared with each other at the same time,
Sy||x is a better indicator for total random error
than r, particularly when the data range is wide, as
illustrated in Table 1
. There, the Sy||x values
increase >10-fold, i.e., from 0.017 to 0.192 nmol/L, whereas the
corresponding r values decrease only from 1 to 0.98. On the
other hand, r is more useful for a general expression of the
magnitude of total random error because it is independent of the units
of x and y. However, r is dependent on
the data range. For wide data ranges (e.g., 3 decades), much greater
values of r are needed to describe the same quality of
correlation than for small data ranges (e.g., 1 decade). This might
explain the difference in what past studies have considered a
"good" r value in method comparisons, namely
r
0.975 (8) or r
0.99
(11).
The mathematical relationship between
Sy|x and r can be
delineated by the formula of Sy|x =
(10). By substituting the slope
by|x=
,
and after simplification (N - 1
N - 2),
r2=1-
is obtained. Thus, the greater the ratio
Sy|x:data range, the lower the
r value.
Note that the higher the ratio Sy||x:data range or the lower the value of r, the greater the uncertainty in the estimates of slope and intercept (this holds true for all regression procedures).
the Sy||x PROBLEMATIC: CONNECTION
BETWEEN Sy||x, ANALYTICAL IMPRECISION,
AND SAMPLE-RELATED EFFECTS
The importance of calculating and interpreting the value for
Sy||x follows the fact that
Sy||x is influenced by two effects, namely, by
sa,tot (which equals
) and by sample-related
effects. Note, however, that Sy||x may be inflated
by nonlinearity and drift/shift during the comparison. Because of the
latter, we recommend that method comparison studies should be done with
particular care for internal quality control (IQC). Thus, in addition
to the common interpretation of systematic differences between methods
reflected by slope and intercept, we advise using the information
content of the regression analysis about random error by comparing the
observed value of Sy||x with the one predicted from
sa,tot (18). However, only the
Sy||x value obtained from OLR can be used for this
purpose (8). Naturally, as already discussed, the higher the
Sy||x values, the more uncertain the regression
estimates for systematic differences between methods will be.
When Sy||x
sa,tot,
sample-related effects are present. When the values for x
come from a hierarchically higher reference method, then it is obvious
that the routine method (y) caused the problems. When two
routine methods are involved, either or both methods might be affected
by sample-related effects.
When sa,tot and Sy||x are of
similar magnitude, one should still check whether the correlation
coefficient r has a "reasonable" value (e.g.,
0.975 or
0.99; see the discussion below). When r is considerably
lower and sa,tot
Sy||x,
method precision is too poor for sufficiently reliable estimation of
slope or intercept. In such a case, there is a strong need to minimize
the influence of imprecision by performing replicate analyses, checking
IQC data (e.g., for drifts or shifts), or measuring all samples within
the same analytical run. We want to stress at this point that method
comparison studies should be done with particular care for IQC, which
means that many more IQC measurements should be performed when carrying
out a method comparison study than when using an assay in routine
operation.
the Sy||x PROBLEMATIC: DIFFERENT
REGRESSION PROCEDURES CALCULATE DIFFERENT VALUES
It is important to notice that the different regression procedures
give different values for Sy||x. Note that
PBR does not calculate values for Sy||x at all,
which we consider a major disadvantage of this regression method. The
formula given by Cornbleet and Gochman (10) calculates
"true" Sy||x values, in the sense of orthogonal
distances of the data pairs from the regression line (those authors
used the term Sy·x instead of
Sy||x). Generally, they are smaller than those from
OLR. However, they become identical to those of OLR in the case of a
zero slope, whereas they are smaller by
when
the slope is 1.
Furthermore, the reader should be aware that commercial software
packages also calculate different values for Sy||x.
DR, as performed with the program of Rhoads (22), gave
values for Sy||x that were nearly identical to
those of OLR. We assume that the Rhoads program for DR calculates
"usual" Sy||x values; however, they may differ
slightly from those of OLR when slope and intercept do not totally
agree in both procedures. SPCA, as performed by the software we used
(21), calculated values of Sy||x that
were smaller by
than those of OLR. In this
program, true Sy||x values are calculated by
assuming an equal imprecision of both methods and, hence, dividing the
usual Sy||x value by
.
As noticed above, only the Sy||x values, as calculated by OLR, can be used to compare the predicted variance across the regression line with the observed one (8).
interpretation of the regression data for case 1 (method
comparisons for serum estradiol-17ß measurement): comparison of olr
and dr/spca
The values for slope and intercept according to OLR and
DR/SPCA are very similar for methods 110 (see Table 1
). The biggest
difference in the slope (intercept) value is 0.007 (0.006 nmol/L) for
method 10. Additionally, the standard errors for slope and intercept
differ <1% between OLR and DR. (The EVAL KIT program does not provide
standard errors for SPCA.) Interestingly, all respective values for
r are
0.99. These findings correspond with the
restrictions implied before by Wakkers et al. (11) to apply
OLR to method comparison data only when r
0.99. On the
contrary, for methods 11 and 12, the values for slope and intercept by
OLR differ substantially from those by DR/SPCA. For these methods,
r values <0.99 are observed. In consequence, in these cases
DR/SPCA would be most appropriate (10)(11)(12)(13). However, from
the purely statistical point of view, one would prefer the application
of DR/SPCA in all cases.
But what is the analytical relevance of these findings? Does the application of, for example, DR in place of OLR add a value to the method comparison for cases 11 and 12? From the analytical point of view, we would doubt this. Interestingly, when recalibration of the routine methods on the basis of their correlation with the reference method was proposed (19), these cases were intuitively excluded because of the poor correlation and the high values of Sy||x. In other words, even when recalibrated for individual samples, those methods would reveal differences from the reference method that were too large. Note also that the uncertainty of the slope (95% confidence level) was ~0.08 for cases 11 and 12, which would introduce a considerable calibration uncertainty when those methods would be recalibrated by use of the method comparison. Therefore, despite its statistical justification, application of DR instead of OLR makes little analytical sense for these cases.
From these first observations, we confirm that OLR is a valid
regression procedure when r
0.99 for method comparison
studies that cover a wide data range. This holds true for the
estimation of slope and intercept and their respective confidence
intervals. Consequently, under the restriction that r be
0.99 for data that cover a wide range, OLR can be applied for
calibration purposes as well as for hypothesis testing. When
r <0.99, one should investigate whether a different
regression procedure really solves the analytical problem.
interpretation of the regression data for case 1 (method
comparisons for serum estradiol-17ß measurement): investigation of
pbr
PBR corresponds very well to the other regression procedures for
low to medium values of Sy||x (see methods 17),
with the exception of method 4. However, PBR slope and intercept
estimates differ from the other regression variants when
Sy||x becomes high (in particular, in methods 11
and 12). It is obvious from this observation that PBR cannot be
regarded as a substitute for DR or SPCA.
As addressed before, test 4 shows the peculiarity that the slopes for
OLR and DR/SPCA differ from that of PBR, despite a relatively low value
of Sy||x (Fig. 1
A). When plotting the OLR residuals (Fig. 1B
), it can be seen
that this discrepancy originates from the fact that the method
comparison data are not linearly related. This is also evidenced from
the sign sequence of the y residuals. PBR gives a sign
sequence of 2 x plus, minus, 3 x plus, 2 x minus,
plus, 8 x minus, and 5 x plus, whereas OLR gives 7 x
plus, minus, plus, 8 x minus, plus, minus, plus, minus, and plus.
These sequences reveal that the middle block of results has a negative
bias compared with the low and high block of results. On the basis of
this observation, a nonlinear regression procedure (e.g., a quadratic
one) may be more appropriate in this case.
|
From the fact that OLR and DR/SPCA were more closely related with each other than with PBR in all comparisons, we conclude that PBR should be applied with care to method comparisons that use medium sample sizes. PBR may treat too many data points as outliers. On the other hand, when discrepancies between different linear regression procedures are observed, this should be taken as a hint for an in-depth investigation of the underlying problem.
interpretation of the regression data for case 2 (method comparison
data for serum potassium)
Table 2
shows that in the case of clustered method comparison data
(by clustered we mean a small concentration range of the xvariables, here within 1 decade), the values for slope and
intercept according to the four regression procedures are quite similar
for low to medium values of Sy||x (see methods
19). Note that for these nine methods, r values between
0.996 and 0.983 were found. This observation stands in relation to the
previously mentioned restriction for using OLR dependent on
r
0.975 (1)(8). Interestingly,
after logarithmic transformation of the data for estradiol-17ß, the
range comes to fall within 1 decade, and the "critical"
r value of 0.99 decreases to ~0.975. This indicates that
the "r
0.975 rule" (8) might be generally
useful as a screening rule for valid application of OLR when data
ranges are <1 decade. However, as mentioned above, from the purely
statistical point of view, DR would be preferable also in those cases.
For methods 1012, the slope according to OLR differs distinctly from
the slopes according to DR, SPCA, and PBR. Notice that for the latter
methods r values of 0.954 (method 10), 0.871 (method 11),
and even 0.652 (method 12) were found. In those cases, are PBR, SPCA,
or DR really the solution to the problem? Again, we would doubt such a
statement. This can be substantiated by the graphical comparison of a
"good" (method 7, r = 0.993), a "borderline"
(method 10, r = 0.954), and a "poor" method
comparison (method 11, r = 0.871; Fig. 2
). Compared with method 7, the worse correlation of method 10
mostly seems to be associated with several outlying results. Indeed,
robust OLR applying a trimming factor of 0.1 for outlying residuals
(23) gives results nearly identical to those of PBR and DR
(robust OLR, slope and intercept: 0.973 and 0.115 mmol/L; PBR, slope
and intercept: 0.970 and 0.129 mmol/L; DR, slope and intercept: 0.973
and 0.135 mmol/L). Alternatively, outliers could have been eliminated
on the basis of the 4 · Sy||x rule
(10). In consequence, one would look for reasons for the
poor outcome. We know from our study (20), that it was not
the method that caused the problems, but the performance of the
laboratory. Clearly, in this case it is not a different regression
procedure that is helpful, but an investigation of the reasons for poor
analytical quality.
|
In addition, for what concerns the interpretation of linear regression data, investigation of the statistical reliability of the estimates is often overlooked. For example, for method 10 (r = 0.954), the 95% confidence limits for slope and intercept according to DR were 0.973 ± 0.078 mmol/L and 0.135 ± 0.336 mmol/L, respectively. It follows that from the statistical point of view, one would conclude that there is no difference between the routine and reference methods because the respective confidence intervals include a slope of 1 and an intercept of 0. Hence, recalibration of the routine method would not be necessary. (Generally, the higher the total random error of a study, the higher the chance that statistical hypothesis testing is passed.) On the contrary, from the analytical point of view, one would certainly consider recalibration of the routine method, especially because the lower limit of the slope is 0.895 (0.973 - 0.078). However, using the method comparison for recalibration, ~8% uncertainty would be added to the original calibration slope because of the uncertainty of the regression in method comparison alone. We consider such a value as too high for a potassium test. (The CLIA limit for total error is ~10% for potassium concentrations at the high end of the reference interval.) This demonstrates again that statistical considerations alone cannot give useful interpretation of method comparison studies.
general recommendations
Present the data graphically and visually inspect them for
adequacy of range and for outliers.
Inspect the data for linearity:
Correlation analysis:
sa,tot, reduce
sa,tot (e.g., by performing
replicates). (b) If Sy||x
sa,tot, there is substantial analytical
difference between the methods because of sample-related effects.
0.99 (wide range), or r
0.975 (small
range), perform linear regression.
Linear regression:
Interpretation:
Special note:
These recommendations are meant as help for interpreting method comparison studies. They will not work in every case; the skill, knowledge, and experience of the analyst are still the most important factor in adequate interpretation of, for example, regression estimates.
| Summary and Conclusion |
|---|
|
|
|---|
sa,tot);
(d) the samples are adequately distributed over the
investigated range; and (e) the number of samples used for
the comparison is adequate to the purpose of the application of linear
regression.
| Footnotes |
|---|
1 Nonstandard abbreviations: OLR, ordinary linear regression; sa, analytical imprecision; DR, Deming regression; SPCA, standardized principal component analysis; PBR, PassingBablok regression; sa,tot, total analytical imprecision; and IQC, internal quality control. ![]()
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
K. Lorentz Routine {alpha}-Amylase Assay Using Protected 4-Nitrophenyl-1,4-{alpha}-D-maltoheptaoside and a Novel {alpha}-Glucosidase Clin. Chem., May 1, 2000; 46(5): 644 - 649. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. O. Westgard Points of Care in Using Statistics in Method Comparison Studies Clin. Chem., November 1, 1998; 44(11): 2240 - 2242. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |