Clinical Chemistry AACC Online Job Center
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Clinical Chemistry 49: 1818-1821, 2003; 10.1373/clinchem.2003.019505
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (11)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Krouwer, J. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krouwer, J. S.
Related Collections
Right arrow Laboratory Management
(Clinical Chemistry. 2003;49:1818-1821.)
© 2003 American Association for Clinical Chemistry, Inc.


Point/Counterpoint

Critique of the Guide to the Expression of Uncertainty in Measurement Method of Estimating and Reporting Uncertainty in Diagnostic Assays

Jan S. Krouwer1

1 Krouwer Consulting, 26 Parks Dr., Sherborn, MA 01770. Fax 508-647-9380; e-mail jan.krouwer{at}comcast.net.


   Abstract
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 
Background: The Guide to the Expression of Uncertainty in Measurement (GUM) provides instructions for constructing uncertainty intervals for a measurement. This method is usually reserved for reference materials, but GUM has been recently proposed as a way to express uncertainty for commercial diagnostic assays.

Methods: Using the official GUM standard and published applications of GUM to commercial diagnostic assays, I undertook an analysis to evaluate whether applying GUM to commercial diagnostic assays is warranted.

Results: Certain important assays, such as troponin I, would not be candidates for GUM because troponin I is not a well-defined physical quantity. Unlike definitive methods, in which efforts are taken to detect and eliminate all systematic error sources, commercial assays often trade off features such as ease of use and cost with accuracy and allow systematic errors to be present as long as the overall accuracy meets the medical need goal. Laboratories are hindered in preparing GUM models because the knowledge required to specify some systematic errors is often available only to manufacturers. Some non-GUM methods to estimate uncertainty rely on observed data, which include both known and unknown sources of error. The occurrence of large, unknown errors for assays in routine use (e.g., outliers) is not unusual because diagnostic assays must be chemically specific in the presence of thousands of potentially interfering substances. There is no provision in GUM to deal with unexplained outliers, which may lead to uncertainty intervals that are not wide enough. Some clinicians assume that diagnostic assay results have little uncertainty. This situation may be made worse by including an uncertainty interval, which implies certification.

Conclusions: Evaluations for accuracy (total analytical error) based on describing the distribution of result differences between commercial assays and reference methods indicate that some assays have a few results with large differences (e.g., outliers). This leads to a wide accuracy interval (total analytical error limits). It is unlikely that GUM would be able to predict these wide intervals, especially because there is little or no provision for outlier treatment in GUM. Presenting too narrow GUM uncertainty intervals to clinicians would be misleading. The modeling used by practitioners of the GUM method is potentially useful in improving quality, but commercial diagnostic assays are not ready for GUM uncertainty statements.


   Introduction
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 
Accuracy, the closeness in agreement between a test result and an accepted reference method, is a concept well known to clinical chemists. Inaccuracy, the lack of closeness in agreement (also called total analytical error), is caused by random and systematic errors. The Guide to the Uncertainty of Measurement (1), hereafter referred to as GUM, provides a model for expressing uncertainty in measurement and is compared with other methods of estimating and reporting uncertainty. GUM was prepared by representatives from several national standards organizations (International Bureau of Weights and Measures, International Organization for Standardization, International Electrotechnical Commission, IUPAC, IFCC, and International Organization of Legal Metrology), which gives it weight as an international guideline. GUM is typically used to provide uncertainty statements for reference materials; an example is NIST 1951a: Lipids in Human Serum (2). GUM may affect how diagnostic assays are evaluated because of the implied authoritativeness of GUM and because diagnostic assays are regulated in many countries. As an example, the European Commission has published an example of the use of GUM for analysis of calcium and glucose in human serum (3)(4). Kristiansen (5) described how GUM applies to diagnostic assays. Here I will describe problems with the GUM method with respect to its use for routine diagnostic assays.

Kallner (6) and the NIST web site(7) provide excellent descriptions of the GUM method. Grabe (8) has critiqued the GUM method. To briefly summarize the GUM approach, GUM considers random error and systematic error as the two possible sources of measurement error. The uncertainty of a measurement result stems from uncertainty attributable to random effects and from imperfect correction of systematic effects.

For the purpose of evaluating uncertainty components, GUM groups uncertainty components into two categories: types "A" and "B". Type A uncertainties are obtained from probability density functions derived from observed frequency distributions, whereas type B uncertainties are obtained from assumed probability density functions. The standard uncertainty of a measurement result, when that result is estimated from the values of other quantities, is called the combined uncertainty and follows the law of propagation of uncertainty. Finally, the combined uncertainty can be multiplied by a coverage factor, k, to yield an expanded uncertainty, which provides an interval about the result of a measurement expected to contain a large fraction of the values. This interval is similar in concept to accuracy or total analytical error (9).


   Differences between Commercial Diagnostic Assays and Processes Used for GUM
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 
The GUM guideline states: "This Guide is primarily concerned with the expression of uncertainty in the measurement of a well-defined physical quantity–the measurand–that can be characterized by an essentially unique value". Unfortunately, this excludes some important diagnostic assays. As an example, troponin I assays can routinely differ by as much as 100-fold from one another (10). This has been ascribed to different epitopes in different commercial assays; hence troponin I is not a well-defined physical quantity as defined by GUM.

The GUM guideline also states: "It is assumed that the result of a measurement has been corrected for all recognized significant systematic effects and that every effort has been made to identify such effects". This amount of effort is commonly carried out for a reference material, whose value has been determined with a definitive method. Tietz (11) described the differences among definitive, reference, and field assays for clinical chemistry assays.

Commercial assays (field assays in Tietz’s terminology) are developed differently than are definitive assays because of different market needs. Commercial assays often emphasize ease of use and low cost, whereas definitive assays focus on attaining the best accuracy possible. For commercial assays, lower accuracy is justified because of affordability and other features (such as ease of use) and the fact that the lower accuracy is still often within stated goals.

As an example, home-use glucose assays serve an important medical need but have not minimized systematic errors to the same extent as the definitive method for assaying glucose. A patient sample contains, in addition to the analyte of interest, thousands of other chemical substances, some of which might interfere in a commercial assay. Although manufacturers attempt to investigate and minimize the effects of interfering substances, reports of assays that nonetheless suffer from these effects are not all that uncommon (12). These reports typically provide an explanation of the root cause of the error, which is often an uncorrected systematic error that has caused clinician concern if not actual harm to the patient.


   Size of Systematic Errors
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 
GUM states that systematic errors can be reduced if the systematic error is "significant in size relative to the required accuracy of the measurement". This of course requires a goal for the accuracy of a measurement. Because there can be many systematic and random error sources, one must create a mathematical model that details how each error source contributes to the overall accuracy of a measurement (the combined uncertainty), create goals for each error source, and assess the magnitude of each source. There are some examples of this (13), which in essence is what GUM is all about.

Field assays typically have lower accuracy requirements (combined uncertainty) than definitive methods. For example, a draft International Organization for Standardization (ISO) glucose document states: "Ninety-five percent (95%) of the individual glucose results shall fall within ± 0.83 mmol/L (15 mg/dL) of the results of the manufacturer’s measurement procedure at glucose concentrations <= 4.2 mmol/L (75 mg/dL) and within ±20% at glucose concentrations >4.2 mmol/L (75 mg/dL)" (14). It would seem to go against the grain of clinical chemists, however, to ignore as insignificant errors that are detected but fall below these limits. Thus a 10% nonlinearity in a glucose assay would likely be detectable and could be corrected even if a mathematical model could show that the 10% nonlinearity would not lead to failing the combined uncertainty goal. The reason is that laboratorians almost always wish to improve quality; the medically acceptable limits for combined uncertainty do not represent a dichotomous limit where on one side there will be high quality and on the other side poor quality. Rather, there is a continuum of quality so that laboratorians are always trying to improve the combined uncertainty, given economic constraints (15).


   Treatment of Systematic Errors
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 
The GUM approach has three ways to treat recognized significant systematic error. To illustrate this, consider a sodium assay with respect to the error source: calibrator lot value assignment error. Each time the calibrator lot is changed, there is a possible fixed bias that lasts until the next calibrator lot change. Whether this bias is observed depends on the laboratory procedure for evaluating calibrator lot changes and on the laboratory’s routine quality-control procedure.

Method 1: The laboratory detects a significant bias in a new calibrator lot and adjusts values to minimize the bias.

Method 2: The laboratory has a certificate from the manufacturer and uses the uncertainty statement from the manufacturer to calculate a standard uncertainty attributable to calibrator error.

Method 3: The laboratory evaluates multiple calibrator lots in an experiment that calculates the standard uncertainty of the calibrator through an ANOVA model. Typically, this evaluation would be done once as part of an evaluation of the candidate assay.

Assume that method 2 or method 3 has been followed for all systematic error sources (e.g., instrument, reagent, calibrator, operator), and an expanded uncertainty statement has been provided. Laboratorians get into a quandary here. If a laboratorian finds a systematic bias, then he or she should eliminate it. However, ensuring that "every effort has been made to identify such (systematic) effects" might mean that each change in every potential systematic error source should be evaluated. This is beyond the scope of most laboratories, although in principle it is desirable. The GUM statement is ultimately what all laboratorians want to achieve. One wants to know the sodium value and its uncertainty independently of any factors. This is particularly important as people travel throughout the healthcare system.


   Lack of Knowledge in Laboratory Error Modeling in GUM
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 
To model errors, Kristiansen (5) suggests the use of cause-and-effect diagrams. Many of the effects described in a cause-and-effect diagram prepared by a laboratory, although conceptually correct, will suffer because information known to a manufacturer is usually unavailable to the laboratory. As one example, consider the error in the recalibration of PO2 for a blood gas analyzer. It is unlikely for the laboratory to know the possible transformations in use in calibration equations. In a typical case, PO2 responses from multiple standards have been previously fit to a quadratic, quadratic spline with one knot by the manufacturer. The transformation allows the use of two standards (e.g., which implies linearity) for routine calibration of an inherently nonlinear response (because of leaks in the analyzer). However, this also means that possible additional biases may result from errors in either the original spline fit or in drift in the responses used to estimate the spline.

There are many other algorithms that manufacturers embed in instrument software that monitor response quality in each sample and lead to either throwing out part of the response data or altogether rejecting an analyzer result. The details of these algorithms are generally unknown by laboratories, but they can cause errors if incorrect.

Of course, if manufacturers disclosed this type of information, laboratories would be able to improve the faithfulness of their models, but this type of disclosure is unlikely because manufacturers will rightly consider these quality algorithms as proprietary.


   Treatment of Outliers
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 
Clinical chemists typically estimate performance parameters using observed data. GUM uses detailed models that describe all error sources. The models are typically based on both assumptions and observed data. The GUM expanded uncertainty statement provides an interval about the result of measurement expected to contain a large fraction of the values by use of a coverage factor based on a gaussian distribution. However, distributions using observed data (i.e., empirical distributions) often exhibit nongaussian shapes.

In an example (16), based on data from Miller et al. (17), 100 randomly collected patient samples were assayed by a commercially available LDL-cholesterol assay and by a reference assay. A nonparametric 95% confidence interval containing at least 95% of the LDL-cholesterol differences between the commercial and reference assays ranged from -0.47 to 5.66 mmol/L (-18 to 219 mg/dL). This huge interval was caused by three outliers in the data. It is unlikely that a GUM approach would have come close to this interval. Reporting this interval with every LDL-cholesterol result is, of course, not useful. However, outliers such as those found here are not only real, albeit infrequent, but precisely the cases that contribute to incorrect medical decisions. In any evaluation, including those conforming to GUM, one should try to determine the root cause of outliers. If outliers are caused by recording mistakes, they may be discarded. However, it is possible, and more so for a laboratory than a manufacturer, that the root cause for an outlier may remain undetermined and hence uncorrected. There is no provision for this in GUM because large systematic errors must be corrected.

Unfortunately, some interpretations of GUM suggest that certain outliers may be discarded (18). In the EURACHEM/CITAC guide referenced by Linko et al. (4) in their example using GUM, an instrument malfunction such as "an air bubble lodged in a spectrophotometer flowthrough cell" can be discarded as a spurious error. If an algorithm detects a bubble and discards the result as part of the assay routine, then this is acceptable, but if a user visually detects a bubble during an evaluation and would not be routinely performing this check, discarding this result will bias the evaluation.

Krouwer (9) has summarized methods that directly estimate inaccuracy (total error) from observed data. Some of the methods do not require modeling at all, although one must ensure that the samples are representative and that sufficient data are collected.


   Clinicians and Laboratory Error
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 
Adding an uncertainty statement to a measurement confers an air of authority to the measurement’s validity. Although the use of uncertainty intervals has been a longstanding tradition for reference materials (19), it may send the wrong message to clinicians. It would appear today that even without the addition of uncertainty statements, some clinicians believe that results from laboratory assays have little or no uncertainty. Consider a case reported in the media that involved a woman with an increased human chorionic gonadotropin (hCG) (20). Clinicians suspected trophoblastic carcinoma, based in part on the increased hCG result, and over several months provided chemotherapy followed by two surgical procedures: hysterectomy and partial removal of one lung. It took 45 hCG assays (all increased) and negative pathology reports before assay error was suspected. Interference by human anti-mouse antibody was confirmed as the source of the assay error; the woman had no cancer. This was not an isolated case; Rotmensch and Cole (21) and Cole et al. (22) found that in 78 cases, there were 35 instances of false-positive errors with 12 cases of unnecessary cancer therapy. A survey (12) of performance complaints showed that previously unknown clinically important interference errors in diagnostic assays were the most frequent error source reported. The magnitude of these errors is often well beyond a reasonable uncertainty interval.

Underestimating uncertainty is not limited to diagnostic assays. Youden (23) compiled 15 different estimates of the astronomical unit from scientists who estimated that quantity over the years 1895–1961. The confidence interval constructed by every scientist did not overlap the confidence interval of his predecessor.


   Recommendations
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 
The GUM uncertainty statement is what is ultimately desired for diagnostic assays. To get there, quality improvement must take place. The modeling by Kristiansen (5) is valuable as a means to identify and rank the importance of errors. However, until diagnostic assays are closer to the quality of definitive methods, the use of GUM uncertainty statements is not recommended.


   References
Top
Abstract
Introduction
Differences between Commercial...
Size of Systematic Errors
Treatment of Systematic Errors
Lack of Knowledge in...
Treatment of Outliers
Clinicians and Laboratory Error
Recommendations
References
 

  1. . International Organization for Standardization. Guide to the expression of uncertainty in measurement 1995:101 ISO Geneva. .
  2. NIST. Certificate of analysis for NIST standard reference material 1951a: lipids in frozen human serum.http://patapsco.nist.gov/srmcatalog/certificates/1951a.pdf (Accessed Sept. 8, 2002)..
  3. Linko S, Örnemark U, Kessel R. Evaluation of measurement uncertainty in clinical chemistry: applications to determinations of total concentration of calcium and glucose in human serum. GE/R/IM/34/01. November 2001. Revised 2001-11-19.http://www.sskb.sk/download/imep-17/evaluation_of_measurement_uncertainty_in_clinical_chemistry.pdf (Accessed Sept. 8, 2002)..
  4. Linko S, Örnemark U, Kessel R, Taylor PDP. Evaluation of uncertainty of measurement in routine clinical chemistry—applications to determinations of the substance concentration of calcium and glucose in serum. Clin Chem Lab Med 2002;40:391-398.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  5. Kristiansen J. Description of a generally applicable model for the evaluation of uncertainty of measurement in clinical chemistry. Clin Chem Lab Med 2001;39:920-931.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  6. Kallner A. Quality specifications based on the uncertainty of measurement. Scand J Clin Lab Invest 1999;59:509-512.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  7. NIST. The NIST reference on constants, units, and uncertainty.http://physics.nist.gov/cuu/Uncertainty/index.html (Accessed Sept. 8, 2002)..
  8. Grabe M. Estimation of measurement uncertainties—an alternative to the ISO guide. Metrologia 2001;38:97-106.[CrossRef]
  9. Krouwer JS. Setting performance goals and evaluating total analytical error for diagnostic assays. Clin Chem 2002;48:919-927.[Abstract/Free Full Text]
  10. Panteghini M. Performance of today’s cardiac troponin assays and tomorrow’s [Editorial]. Clin Chem 2002;48:809-810.[Free Full Text]
  11. Tietz NW. A model for a comprehensive measurement system in clinical chemistry. Clin Chem 1979;25:833-839.[Free Full Text]
  12. Krouwer JS. Estimating total analytical error and its sources. Arch Pathol Lab Med 1992;116:726-731.[ISI][Medline] [Order article via Infotrieve]
  13. Aronsson T, de Verdier CH, Groth T. Factors influencing the quality of analytical methods—a systems analysis, with use of computer simulation. Clin Chem 1974;20:738-748.[Abstract]
  14. . International Organization for Standardization. Requirements for in vitro blood glucose monitoring systems for self-testing in managing diabetes mellitus. ISO/FDIS 15197 2002 ISO Geneva. .
  15. Ross PJ. Taguchi techniques for quality engineering, 2nd ed 1996:329 McGraw-Hill New York. .
  16. . National Committee for Clinical Laboratory Standards. Estimation of total analytical error for clinical laboratory methods; approved guideline. NCCLS document E21-A 2003 NCCLS Villanova, PA. .
  17. Miller WG, Waymack PP, Anderson FP, Ethridge SF, Jayne EC. Performance of four homogeneous direct methods for LDL-cholesterol. Clin Chem 2002;48:489-498.[Abstract/Free Full Text]
  18. Ellison SLR, Rosslein M, Williams A, eds. Quantifying uncertainty in analytical measurement, 2nd ed. EURACHEM/CITAC, 2002.http://www.eurachem.ul.pt (Accessed Sept. 10, 2002)..
  19. NIST Technology Services. Clinical laboratory materials (gas, liquid, and solid forms). http://ois.nist.gov/srmcatalog/tables/view_table.cfm?table=105–1.htm(Accessed Sept. 8, 2002)..
  20. Sainato D. How labs can minimize the risks of false positive results. Clin Lab News 2001;27:6-8.
  21. Rotmensch S, Cole LA. False diagnosis and needless therapy of presumed malignant disease in women with false-positive human chorionic gonadotropin concentrations. Lancet 2000;355:712-715.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  22. Cole LA, Rinne KM, Shahabi S, Omrani A. False positive hCG assay results leading to unnecessary surgery and chemotherapy and needless occurrences of diabetes and coma. Clin Chem 1999;45:313-314.[Free Full Text]
  23. Youden WJ. Enduring values. Technometrics 1972;14:1-11.



The following articles in journals at HighWire Press have cited this article:


Home page
Clin. Chem.Home page
M. Rynning, T. Wentzel-Larsen, and B. J. Bolann
A Model for an Uncertainty Budget for Preanalytical Variables in Clinical Chemistry Analyses
Clin. Chem., July 1, 2007; 53(7): 1343 - 1348.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
J. Middleton and J. E. Vaks
Evaluation of Assigned-Value Uncertainty for Complex Calibrator Value Assignment Processes: A Prealbumin Example
Clin. Chem., April 1, 2007; 53(4): 735 - 741.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
J. S. Krouwer
Uncertainty intervals based on deleting data are not useful.
Clin. Chem., June 1, 2006; 52(6): 1204 - 1205.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
R. M. Lequin
Guide to the Expression of Uncertainty of Measurement: Point/Counterpoint
Clin. Chem., May 1, 2004; 50(5): 977 - 978.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
J. Kristiansen
The Guide to Expression of Uncertainty in Measurement Approach for Estimating Uncertainty: An Appraisal
Clin. Chem., November 1, 2003; 49(11): 1822 - 1829.
[Abstract] [Full Text]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (11)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Krouwer, J. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krouwer, J. S.
Related Collections
Right arrow Laboratory Management


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS