|
|
||||||||
Point/Counterpoint |
1 Krouwer Consulting, 26 Parks Dr., Sherborn, MA 01770. Fax 508-647-9380; e-mail jan.krouwer{at}comcast.net.
| Abstract |
|---|
|
|
|---|
Methods: Using the official GUM standard and published applications of GUM to commercial diagnostic assays, I undertook an analysis to evaluate whether applying GUM to commercial diagnostic assays is warranted.
Results: Certain important assays, such as troponin I, would not be candidates for GUM because troponin I is not a well-defined physical quantity. Unlike definitive methods, in which efforts are taken to detect and eliminate all systematic error sources, commercial assays often trade off features such as ease of use and cost with accuracy and allow systematic errors to be present as long as the overall accuracy meets the medical need goal. Laboratories are hindered in preparing GUM models because the knowledge required to specify some systematic errors is often available only to manufacturers. Some non-GUM methods to estimate uncertainty rely on observed data, which include both known and unknown sources of error. The occurrence of large, unknown errors for assays in routine use (e.g., outliers) is not unusual because diagnostic assays must be chemically specific in the presence of thousands of potentially interfering substances. There is no provision in GUM to deal with unexplained outliers, which may lead to uncertainty intervals that are not wide enough. Some clinicians assume that diagnostic assay results have little uncertainty. This situation may be made worse by including an uncertainty interval, which implies certification.
Conclusions: Evaluations for accuracy (total analytical error) based on describing the distribution of result differences between commercial assays and reference methods indicate that some assays have a few results with large differences (e.g., outliers). This leads to a wide accuracy interval (total analytical error limits). It is unlikely that GUM would be able to predict these wide intervals, especially because there is little or no provision for outlier treatment in GUM. Presenting too narrow GUM uncertainty intervals to clinicians would be misleading. The modeling used by practitioners of the GUM method is potentially useful in improving quality, but commercial diagnostic assays are not ready for GUM uncertainty statements.
| Introduction |
|---|
|
|
|---|
Kallner (6) and the NIST web site(7) provide excellent descriptions of the GUM method. Grabe (8) has critiqued the GUM method. To briefly summarize the GUM approach, GUM considers random error and systematic error as the two possible sources of measurement error. The uncertainty of a measurement result stems from uncertainty attributable to random effects and from imperfect correction of systematic effects.
For the purpose of evaluating uncertainty components, GUM groups uncertainty components into two categories: types "A" and "B". Type A uncertainties are obtained from probability density functions derived from observed frequency distributions, whereas type B uncertainties are obtained from assumed probability density functions. The standard uncertainty of a measurement result, when that result is estimated from the values of other quantities, is called the combined uncertainty and follows the law of propagation of uncertainty. Finally, the combined uncertainty can be multiplied by a coverage factor, k, to yield an expanded uncertainty, which provides an interval about the result of a measurement expected to contain a large fraction of the values. This interval is similar in concept to accuracy or total analytical error (9).
| Differences between Commercial Diagnostic Assays and Processes Used for GUM |
|---|
|
|
|---|
The GUM guideline also states: "It is assumed that the result of a measurement has been corrected for all recognized significant systematic effects and that every effort has been made to identify such effects". This amount of effort is commonly carried out for a reference material, whose value has been determined with a definitive method. Tietz (11) described the differences among definitive, reference, and field assays for clinical chemistry assays.
Commercial assays (field assays in Tietzs terminology) are developed differently than are definitive assays because of different market needs. Commercial assays often emphasize ease of use and low cost, whereas definitive assays focus on attaining the best accuracy possible. For commercial assays, lower accuracy is justified because of affordability and other features (such as ease of use) and the fact that the lower accuracy is still often within stated goals.
As an example, home-use glucose assays serve an important medical need but have not minimized systematic errors to the same extent as the definitive method for assaying glucose. A patient sample contains, in addition to the analyte of interest, thousands of other chemical substances, some of which might interfere in a commercial assay. Although manufacturers attempt to investigate and minimize the effects of interfering substances, reports of assays that nonetheless suffer from these effects are not all that uncommon (12). These reports typically provide an explanation of the root cause of the error, which is often an uncorrected systematic error that has caused clinician concern if not actual harm to the patient.
| Size of Systematic Errors |
|---|
|
|
|---|
Field assays typically have lower accuracy requirements (combined uncertainty) than definitive methods. For example, a draft International Organization for Standardization (ISO) glucose document states: "Ninety-five percent (95%) of the individual glucose results shall fall within ± 0.83 mmol/L (15 mg/dL) of the results of the manufacturers measurement procedure at glucose concentrations
4.2 mmol/L (75 mg/dL) and within ±20% at glucose concentrations >4.2 mmol/L (75 mg/dL)" (14). It would seem to go against the grain of clinical chemists, however, to ignore as insignificant errors that are detected but fall below these limits. Thus a 10% nonlinearity in a glucose assay would likely be detectable and could be corrected even if a mathematical model could show that the 10% nonlinearity would not lead to failing the combined uncertainty goal. The reason is that laboratorians almost always wish to improve quality; the medically acceptable limits for combined uncertainty do not represent a dichotomous limit where on one side there will be high quality and on the other side poor quality. Rather, there is a continuum of quality so that laboratorians are always trying to improve the combined uncertainty, given economic constraints (15).
| Treatment of Systematic Errors |
|---|
|
|
|---|
Method 1: The laboratory detects a significant bias in a new calibrator lot and adjusts values to minimize the bias.
Method 2: The laboratory has a certificate from the manufacturer and uses the uncertainty statement from the manufacturer to calculate a standard uncertainty attributable to calibrator error.
Method 3: The laboratory evaluates multiple calibrator lots in an experiment that calculates the standard uncertainty of the calibrator through an ANOVA model. Typically, this evaluation would be done once as part of an evaluation of the candidate assay.
Assume that method 2 or method 3 has been followed for all systematic error sources (e.g., instrument, reagent, calibrator, operator), and an expanded uncertainty statement has been provided. Laboratorians get into a quandary here. If a laboratorian finds a systematic bias, then he or she should eliminate it. However, ensuring that "every effort has been made to identify such (systematic) effects" might mean that each change in every potential systematic error source should be evaluated. This is beyond the scope of most laboratories, although in principle it is desirable. The GUM statement is ultimately what all laboratorians want to achieve. One wants to know the sodium value and its uncertainty independently of any factors. This is particularly important as people travel throughout the healthcare system.
| Lack of Knowledge in Laboratory Error Modeling in GUM |
|---|
|
|
|---|
There are many other algorithms that manufacturers embed in instrument software that monitor response quality in each sample and lead to either throwing out part of the response data or altogether rejecting an analyzer result. The details of these algorithms are generally unknown by laboratories, but they can cause errors if incorrect.
Of course, if manufacturers disclosed this type of information, laboratories would be able to improve the faithfulness of their models, but this type of disclosure is unlikely because manufacturers will rightly consider these quality algorithms as proprietary.
| Treatment of Outliers |
|---|
|
|
|---|
In an example (16), based on data from Miller et al. (17), 100 randomly collected patient samples were assayed by a commercially available LDL-cholesterol assay and by a reference assay. A nonparametric 95% confidence interval containing at least 95% of the LDL-cholesterol differences between the commercial and reference assays ranged from -0.47 to 5.66 mmol/L (-18 to 219 mg/dL). This huge interval was caused by three outliers in the data. It is unlikely that a GUM approach would have come close to this interval. Reporting this interval with every LDL-cholesterol result is, of course, not useful. However, outliers such as those found here are not only real, albeit infrequent, but precisely the cases that contribute to incorrect medical decisions. In any evaluation, including those conforming to GUM, one should try to determine the root cause of outliers. If outliers are caused by recording mistakes, they may be discarded. However, it is possible, and more so for a laboratory than a manufacturer, that the root cause for an outlier may remain undetermined and hence uncorrected. There is no provision for this in GUM because large systematic errors must be corrected.
Unfortunately, some interpretations of GUM suggest that certain outliers may be discarded (18). In the EURACHEM/CITAC guide referenced by Linko et al. (4) in their example using GUM, an instrument malfunction such as "an air bubble lodged in a spectrophotometer flowthrough cell" can be discarded as a spurious error. If an algorithm detects a bubble and discards the result as part of the assay routine, then this is acceptable, but if a user visually detects a bubble during an evaluation and would not be routinely performing this check, discarding this result will bias the evaluation.
Krouwer (9) has summarized methods that directly estimate inaccuracy (total error) from observed data. Some of the methods do not require modeling at all, although one must ensure that the samples are representative and that sufficient data are collected.
| Clinicians and Laboratory Error |
|---|
|
|
|---|
Underestimating uncertainty is not limited to diagnostic assays. Youden (23) compiled 15 different estimates of the astronomical unit from scientists who estimated that quantity over the years 18951961. The confidence interval constructed by every scientist did not overlap the confidence interval of his predecessor.
| Recommendations |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
M. Rynning, T. Wentzel-Larsen, and B. J. Bolann A Model for an Uncertainty Budget for Preanalytical Variables in Clinical Chemistry Analyses Clin. Chem., July 1, 2007; 53(7): 1343 - 1348. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Middleton and J. E. Vaks Evaluation of Assigned-Value Uncertainty for Complex Calibrator Value Assignment Processes: A Prealbumin Example Clin. Chem., April 1, 2007; 53(4): 735 - 741. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Krouwer Uncertainty intervals based on deleting data are not useful. Clin. Chem., June 1, 2006; 52(6): 1204 - 1205. [Full Text] [PDF] |
||||
![]() |
R. M. Lequin Guide to the Expression of Uncertainty of Measurement: Point/Counterpoint Clin. Chem., May 1, 2004; 50(5): 977 - 978. [Full Text] [PDF] |
||||
![]() |
J. Kristiansen The Guide to Expression of Uncertainty in Measurement Approach for Estimating Uncertainty: An Appraisal Clin. Chem., November 1, 2003; 49(11): 1822 - 1829. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |