|
|
||||||||
1 Krouwer Consulting, 26 Parks Dr., Sherborn, MA 01770. Fax 508-647-9380; e-mail jan.krouwer{at}attbi.com.
| Abstract |
|---|
|
|
|---|
Methods: A review of the literature was undertaken to reconcile the different estimation approaches.
Results: The simple combination model can underestimate total analytical error by neglecting random interference bias and by not properly treating other error sources such as linear drift and outliers. A simulation method to estimate total analytical error is outlined, based on the estimation and combination of total analytical error source distributions. Goals for each total analytical error source can be established by allocation of the total analytical error goal. Typically, the allocation is cost-based and uses the probability of combinations of error sources. The distribution-of-differences method, simple combination model, and simulation method to evaluate total analytical error are compared. Outlier results can profoundly influence quality, but their rates are seldom reported.
Conclusions: Total analytical error should be estimated either directly by the distribution-of-differences method or by simulation. A systems engineering approach that uses allocation of the total analytical error goal into error source goals provides a cost-effective approach to meeting total analytical error. Because outliers can cause serious laboratory error, the inclusion of outlier rate estimates from large studies (e.g., those conducted by manufacturers) would be helpful in assessing assay quality.
| Introduction |
|---|
|
|
|---|
Performance goals for laboratory testing have most often been developed for total analytical error and for imprecision (SD) and bias. A total analytical error goal requires that the combination of errors from all sources is within some acceptable limit. From a clinicians standpoint, this is the most useful goal, because an incorrect laboratory result, regardless of which component(s) of total analytical error has caused it, is harmful. A total analytical error goal also enables a simple and cost-effective assessment of the suitability of a particular assay because there is only one error source to estimate.
On the other hand, manufacturers are interested in total analytical error sources because knowledge of these error sources and their subsequent correction are the only way to reduce total analytical error and hence improve quality. Laboratories have a position between manufacturers and clinicians. They do not have the resources (and often the proprietary knowledge required) to perform the extensive studies carried out by manufacturers, but they are responsible in part for the quality of assay results and thus must be knowledgeable in total analytical error as well as its sources.
Currently, most total analytical error performance goals are not provided directly (4)(5). Rather, the total analytical error goal is constructed from a combination of a bias goal and an imprecision goal (Eq. 1
).
![]() | (1) |
This report will show that the model represented by Eq. 1
(hereafter referred to as the simple combination model) can underestimate total analytical error because some possible error sources are either absent from the model or not treated properly. Additional error sources that may be used to establish a more complete model of total analytical error will be discussed. An alternative method to estimate total analytical error, referred to as the distribution-of-differences method and which does not require modeling at all, will be discussed and contrasted with the simple combination model. A goal allocation method that is commonly used in systems engineering will be reviewed, where it is a common task to allocate an overall system goal into a series of component goals. Finally, the role of outliers will be considered.
| Additional Error Sources Can Help Establish a More Complete Model for Total Analytical Error |
|---|
|
|
|---|
Three additional error source types are discussed as examples of error sources that are neglected, improperly handled in Eq. 1
, or difficult to incorporate. These additional error sources can affect the estimates of imprecision and average bias as well as total analytical error.
random biases attributable to interferences in patient samples: a neglected error source
A patient sample that is being analyzed contains not only the analyte of interest, but also a unique mixture of thousands of other chemical substances. If assays were completely specific, the presence of these additional substances would be of no consequence. However, most assays, including immunoassays (6), suffer from some degree of nonspecificity. This means that each patient sample will possibly exhibit a bias unique to that patients mixture of substances that exhibit nonspecific reactions in the assay. Examples of this bias, shown in Fig. 1
, demonstrate that some patient samples that are assayed repeatedly and compared with a reference method will consistently produce values that are on one side of a regression line, whereas samples from a different patient specimen will fall on the opposite side (7). In a method comparison experiment, this random bias will inflate the standard error of the estimate (Sy|x) and contribute to the average bias by influencing the regression coefficients.
|
Lawton et al. (8) represent this interference bias as a random error. Krouwer (9), using actual data from a cholesterol evaluation, showed that failing to account for this error can underestimate total analytical error. The random interference bias attributable to nonspecific effects in patient samples is different from the random error attributable to repeatedly assaying the same patient sample. The combination of bias and imprecision in Eq. 1
does not account for the effect of random interference bias and will thus underestimate total analytical error unless random interference bias is zero. See Appendix A for a mathematical explanation.
effect of linear drift on imprecision and average bias: an example of an incorrectly handled error source
Linear drift, if present, is an example of another error source that is not correctly accounted for by Eq. 1
. Consider a protocol for estimating imprecision in an assay that exhibits a positive linear drift. Krouwer (10) has shown, based on work by Haeckel and Schneider (11), that the observed imprecision will actually be a combination of pure random error and bias, according to Eq. 2
.
![]() | (2) |
where
sa = the imprecision observed
sp = the true random error (imprecision)
b = the average bias attributable to linear drift
The amount of bias observed will depend on the protocol. It will be smallest for a protocol that samples consecutive duplicate specimens and largest for a protocol that samples the first and last specimens in a calibration run. A common protocol of running 10 consecutive replicate specimens will exhibit an intermediate amount of bias.
Consider the effect of drift on bias estimation from a method comparison. The concept of average bias implies that for any sample assayed, the test result should be equal to random error plus the regression equation (i.e., bias is explained as a proportional plus a constant difference from the reference result). With drift present, this is not true. Samples assayed early in a calibration run will have a reproducibly different average bias than samples assayed later in the run (see Eq. 3
).
One can estimate linear drift from a suitable protocol using a multiple regression model such as Eq. 3
.
![]() | (3) |
where
Y = the observed result
b0 = the estimated intercept coefficient (constant error)
b1 = the estimated slope coefficient (proportional error)
b2 = the estimated linear drift coefficient (linear drift error)
e = the estimated random error (pure random error)
X = the reference result
t = time of assay
Another interpretation of this example is that the model implied by Eq. 1
(e.g., Eq. 3
without the drift term) contains less knowledge about the true state of the process than Eq. 3
. Goldschmidt and Krouwer (12) showed an example where the proportional bias was incorrectly estimated when a protocol was used that did not have a drift term in the model. An illustration, using a simple regression equation, of the different states of knowledge provided by random and systematic error is shown in Table 1
.
|
treatment of outliers
In a method comparison experiment used to estimate average bias, it is standard and accepted practice (13) to remove outliers, should they occur. It makes sense to remove these outlier samples when assaying a small number of samples, otherwise the parameters estimated will not be representative. The problem is that there is no mechanism for these outliers to play any role in the simple combination method. They simply disappear from the analysis, although they will still be present in real life. When the distribution-of-differences method (below) is used, there is no basis for removing outliers, nor is there anything wrong (from an estimation sense) in a skewed distribution of differences.
| A More Complete List of Total Analytical Error Sources |
|---|
|
|
|---|
|
apparent random error
Apparent random error is the imprecision estimated from protocols where replicates of a sample are assayed. If there are no systematic biases present, apparent random error and pure random error will be equal.
average bias
Average bias is the method used by the simple combination method to estimate all systematic error. It estimates the slope (to convert the regression equation to a bias equation, 1 is subtracted from the slope) and intercept of a regression equation. The slope and intercept represent proportional and constant error, respectively. If there are other systematic errors present, the average bias will be incorrect. For example, if an assay is not linear at the upper end of the assay range, the slope and intercept of the regression equation will only partially express the average bias at the upper end of the range.
pure random error
Pure random error (not shown in Fig. 2
) is the apparent random error with systematic error removed. This removal can be achieved by a suitable multifactor protocol (9) whereby pure random error is the residual error term from the model, as in Eq. 3
. If systematic effects exist and are not removed, the apparent random error (e.g., that calculated without a multifactor protocol) will be greater than pure random error.
random error
Random error refers to the collection of error sources whose effect is modeled as samples from a probability distribution. One can sometimes model the same error source as either random or systematic, such as discussed below for interferences.
systematic error
Systematic error is the collection of error sources whose effect is modeled by an equation that describes the effect of the error sources for every sample.
protocol-independent bias
Protocol-independent bias refers to a collection of error sources that are largely independent of the protocol used to estimate them. Here, the protocol refers to every aspect of the assay, e.g., the sample order, reagent lot, and calibration sequence. Protocol-independent refers to the fact that the protocol usually is not a factor in the magnitude of the error source. For example, if an assay is inherently nonlinear and the nonlinearity is not corrected by software, then one can always expect this nonlinearity to be present. However, in some cases, nonlinearity may not be independent of the protocol. Nonlinearity can be caused by instability in a reagent, in which case the magnitude of the error source may depend on the reagent lot and its age.
protocol-dependent bias
Protocol-dependent bias refers to a collection of error sources that are largely dependent on the protocol used to estimate them. For example, linear drift depends not only on an instability in the assay response, but also on the sample order (e.g., the time of assay since the last calibration). Thus, the tenth sample assayed always has 10 times as much linear drift as the first sample assayed; hence the protocol is always involved in the bias equation. In addition, the extent of drift may vary from run to run. Usually the magnitude of the drift in a specific run cannot be predicted and is modeled by sampling the drift run magnitude from a suitable probability distribution. Thus, linear drift has a random as well as a systematic component. As an equation:
![]() | (4) |
random interference bias
For each patient sample, a seemingly random bias component, additional to pure random error and caused by nonspecificity of the assay and the presence of interfering substances, may exist. For an assay with perfect specificity, the random interferences term would be zero. This error source can be estimated from a method comparison experiment (8). One can test for the presence of random interference bias by ANOVA by comparing Sy|x to the imprecision estimated from replicates. The actual substance(s) causing the interference does not need to be known for a random interference term to be estimated.
specific interference bias
Specific interference bias is error caused by nonspecificity of the assay attributable to the presence of a specific substance. This error is measured by interference experiments (14). Manufacturers often test large numbers of potentially interfering substances. It is conceptually possible to estimate the effect of every possible interfering substance in an assay as well as to determine the concentration of each interfering substance in each patient sample assayed. Were this to be done, this error source would be completely deterministic. Because this is impractical (one cannot even be sure that one has thought of all possible interference candidates), interferences are also modeled as a random error source in estimations of total analytical error.
nonlinear bias
Nonlinear bias is bias that cannot be represented by a proportional relationship between the test and reference assay concentrations. This bias can be estimated from a method comparison experiment with higher order polynomial terms in the regression equation (15). The high-dose hook effect in immunoassays is an example of nonlinear bias.
drift
Drift is an error that is related to the time of assay since the last calibration. Drift may be linear or nonlinear and can be estimated by multifactor protocols or by protocols that specifically account for time of assay.
sample carryover
Sample carryover is an error attributable to the contamination of the current sample with the previous sample. Sample carryover errors are important only if the concentration differences of the two samples are reasonably large. Sample carryover may be estimated by multifactor protocols or by protocols that specifically account for the possibility of sample contamination, such as assaying a high-concentration sample followed by a series of low-concentration samples.
reagent carryover
Reagent carryover is an error in random access analyzers whereby the current assay is contaminated by reagent from the previous assay. Reagent carryover errors are important only when the contamination causes an effect, such as when an aspartate aminotransferase reagent precedes a lactate dehydrogenase reagent (lactate dehydrogenase is often part of the formulation of an aspartate aminotransferase reagent). Reagent carryover is estimated from protocols that take into account these combinations.
reagent/calibrator lot effects
The presence of something that is different in a new calibrator or reagent compared with the previous calibrator or reagent can cause lot effects. For example, a calibrator with an erroneously assigned value will cause a bias in every value assayed with that calibrator lot. These error sources are often difficult to assess from protocols because sufficient different lots are often unavailable. In cases where there are enough samples, these error sources can be treated as random imprecision components. Manufacturers can assess these error sources for reagent lots from factorial studies in which different reagents lots are made with appropriate concentrations of reagent constituents to simulate manufacturing variances. Effects of calibration lot errors can often by estimated by mathematical simulation.
| Accounting for All Terms in an Expanded Model |
|---|
|
|
|---|
Given the distribution of each error source, it is possible to create a simulation model (e.g., with software) that samples each error source from its distribution (which may not be a gaussian distribution) and combines all errors to arrive at the total analytical error (16). To test the accuracy of the simulation, one can compare total analytical error estimated from the simulation with total analytical error estimated directly from a method comparison experiment.
Typically the detailed equation for this model will be quite complicated, with every possible effect having its own term, although in principle, the model will simply be an expansion of Eq. 1
.
| Alternative Method to Estimate Total Analytical Error: The Distribution-of-Differences Method |
|---|
|
|
|---|
| Comparison of Approaches to Estimation of Total Analytical Error |
|---|
|
|
|---|
The following example illustrates a benefit of the distribution-of-differences method compared with the simple combination model.
Consider a blood gas laboratory that is evaluating a lactate assay over a 2-week period. The laboratory has three analyzers; each analyzer receives a new electrode once a week and is calibrated every 30 min. This means that the variables instrument, electrode, and calibration are all potential error sources. The only way that the simple combination model can accommodate these error sources is to consider them as random error sources in an ANOVA model to estimate the imprecision term. For most laboratories, the correct formulation of the ANOVA model will be a challenging task. In the distribution-of- differences method, no ANOVA model is needed. For this or for any evaluation, one always simply computes all differences.
A disadvantage of the distribution-of-differences method is that the "differences" may not be solely attributable to the candidate method. Nevertheless, it is important to predict the outcome of switching an assay from the current assay (likely to not be a reference assay) to a new assay. Estimation of differences, whether they are attributable to the candidate or the comparison method, is nevertheless important because it is these differences that clinicians will observe.
The problem of determining which method is causing the difference is equally true for the simple combination method when the error source is attributable to bias. However, imprecision is treated differently in the two estimation methods. In the distribution-of-differences method, a difference is the bias between methods plus the imprecision of each method. Of course, laboratories are interested in knowing whether candidate methods are better; therefore, to ascribe as much error as possible to the comparison method, one should use a reference method to minimize bias in the comparison method and run replicate comparison method specimens to minimize imprecision in the comparison method. An ideal evaluation would be to run a three-way comparison consisting of the candidate, current, and reference methods.
Although the distribution-of-differences method does not provide an estimate of imprecision, laboratories will always evaluate separately the imprecision of a candidate assay to ensure that it will meet regulatory requirements.
full combination model
The advantage of the full combination model is that, in addition to giving an estimate of total analytical error, it also provides detailed information about all error sources. The main disadvantage of this method is the large effort required both experimentally and with modeling to arrive at proper estimates.
simple combination model
The main problem with the simple combination model, as described above, is that it often underestimates total analytical error. Moreover, because this method is also used to construct goals for total analytical error, these goals will be suspect as well. An example of this is the total analytical error goal suggested by the National Cholesterol Education Program, which uses the simple combination method (20).
| Detection vs Estimation |
|---|
|
|
|---|
Note, however, that optimal quality control can never detect error attributable to random interference bias because quality-control samples contain the same matrix in every sample, unlike patients samples, which contain mixtures of different substances needed to detect random interference bias. This highlights the importance of determining the significance of random interference bias during a method evaluation and underscores the limitation of the simple combination method, which does not account for this error source.
allocating total analytical error goals
Assay performance goals allow evaluation results to be compared to a limit to determine whether an assay is acceptable. Goals are established by manufacturers (or laboratories) for several reasons:
In addition, assays can be used in different ways, which may require different goals. For example, an assay that is used for diagnostic purposes is different from an assay that is used to monitor patients. In the latter case, serial measurements require that imprecision is the main parameter specified (22). This section deals with medical need goals for assays that are used for diagnostic purposes and assumes that a total analytical error goal has already been established.
Most assay performance goal setting in clinical chemistry has focused on setting goals for individual error sources (2). Although most work has been devoted to goals for imprecision and bias, other error sources, such as reagent-to-reagent bias (3) and interference bias (23), have been studied. These suggestions provide valuable insights into assay quality.
One limitation to the above goal-setting process, however, is that focusing on specifying a performance goal for an individual error source makes it is difficult to account for all other possible error sources, which is necessary to avoid specifying a performance goal for an individual error source that in practice causes the total analytical error goal to be exceeded. A solution to this problem is to create error source goals by allocation, using a systems engineering approach (24).
Using reliability as an example, the systems engineering approach starts with the desired overall system reliability goal. One then estimates the reliability of each component from all subsystems and combines the individual estimates into an overall system reliability estimate. This is then compared with the goal. If the estimated reliability does not meet its goal, one must allocate the desired system reliability goal into goals for each component. Typically, the method used for this allocation is cost-based. The following example illustrates the systems engineering approach for an assay.
| A Goal Allocation Example |
|---|
|
|
|---|
The above example could be further complicated by assuming that in addition to the above error sources, there were errors from five systematic biases (e.g., lot-to-lot reagent bias). It would almost always be a bad idea to allocate error equally among all of these error sources because dividing a total analytical error goal into seven equal parts would lead to each error goal being quite stringent. Moreover, the probability that all seven error sources occur simultaneously and that each at its maximum level would be extremely low; the pitfalls of such a "worst case" approach are shown in Appendix B. Thus, the allocation must take into account probability of occurrence.
| Outliers Must Be Accounted for |
|---|
|
|
|---|
The use of a total analytical error goal does not solve the outlier issue, in spite of the word "total". The problem is that specifying total analytical error to mean that at least 95% of results are within an acceptable limit also means that up to 5% of results could be outside of this limit. Even with 99% limits, 1% of a large number of assay results is still a big number. Because laboratories can easily report 1 million results per year, if all results just met 99% acceptance limits, there would still be 10 000 results per year that were unacceptable according to total analytical error goals.
It would be naive to assume that a result just inside a total analytical error goal would be perfectly acceptable and that a result just outside this goal would cause a disaster. There is, rather, a continuum of quality. Thus, if all results outside the total analytical goal were nevertheless close to the goal, it is unlikely that these results would cause problems. This implies the use of another set of limits to define what "close to the goal" is. In addition to total analytical error limits, a wider set of limits could specify values that should never occur. Of course, one cannot test for the occurrence of "never"; however, if no outliers are found in a large sample size, one can guarantee that outlier rates can be no larger than a very small percentage.
Practically speaking, only manufacturers conduct studies of this magnitude. Although samples sizes are different for each assay, extremely large sample sizes (thousands) are common during product development, and the combination of results from all field trials often also produces a large sample size.
The most conservative way to estimate outlier frequency is to consider an outlier as a discrete event and use the binomial distribution (26). It would be unwarranted to estimate potential outlier magnitudes and rates by simply calculating higher multiples from an estimated standard deviation. This is because there is no guarantee that an outlier comes from the same distribution that is used to calculate the standard deviation.
Typically, when a manufacturer finds a root cause for an outlier, either a design change for the assay is implemented, an algorithm is incorporated that prevents the result from being reported, or in some cases, and the least desirable, a caution is noted for the condition that could cause the outlier. Although development is a proprietary process, it would be helpful if manufacturers reported on the summary results of studies that estimate outlier rates.
conclusions
Total analytical error is a useful metric for laboratory assay quality. The use of Eq. 1
to estimate total analytical error is incorrect because it does not account for all potential error sources. Total analytical error can be estimated directly from a method comparison experiment. This estimate can be compared with a total analytical error goal. This simple approach can be used by both laboratories and manufacturers, with manufacturers using much larger sample sizes and sampling from all known potential error sources.
By studying the details of the assay process, one can enumerate various total analytical error sources. Different protocols are needed to estimate each total analytical error source. With knowledge of the distribution of error sources, a simulation model can be used to combine these sources to estimate total analytical error. The goal for total analytical error can be allocated into goals for each total analytical error source. Outlier rates must also be quantified.
| Appendix A |
|---|
|
|
|---|
![]() | (1A) |
Eq. 2A
is an expansion of Eq. 1A
to account for n replicates of each of m different specimens.
![]() | (2A) |
where
TAE = total analytical error
yij = the ith observation from the jth sample of the new method
Rj = the reference method result for the jth sample
= the mean of the jth sample of the new method
In Eq. 2A
, the second double summation term is a measure of imprecision, and the last term represents the distribution of bias that is observed in each sample as seen in Fig. 1
.
| Appendix B |
|---|
|
|
|---|
To see why this is a poor strategy, consider the likelihood of a result that occurs because each bias is
3 SD (each with the same sign) simultaneously. This is equal to 2 x 0.0035 = 4.86-11%. On average, we would need to run >40 billion assays before seeing one such occurrence. Hence, allocation must take into account probability of occurrence.
| Acknowledgments |
|---|
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
J. S. Krouwer Critique of the Guide to the Expression of Uncertainty in Measurement Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin. Chem., November 1, 2003; 49(11): 1818 - 1821. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |