Clinical Chemistry Link to Randox Laboratories Web Site
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Clinical Chemistry 46: 89-99, 2000;
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (12)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Jenny, R. W.
Right arrow Articles by Jackson-Tarentino, K. Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jenny, R. W.
Right arrow Articles by Jackson-Tarentino, K. Y.
Related Collections
Right arrow Laboratory Management
(Clinical Chemistry. 2000;46:89-99.)
© 2000 American Association for Clinical Chemistry, Inc.


Articles

Causes of Unsatisfactory Performance in Proficiency Testing

Richard W. Jennya and Kathryn Y. Jackson-Tarentino

a Author for correspondence. Fax 518-473-2900; e-mail jenny{at}wadsworth.org


   Abstract
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Background: Proficiency testing (PT) provides a measure of the effectiveness of laboratory quality assurance programs. Test reports are released from processes that the laboratory judges to be in conformance with quality specifications; an evaluation of unsatisfactory performance (UNSAT) by a PT provider is an unexpected outcome for the laboratory. An understanding of the root cause(s) of testing errors provides an opportunity for the continuous improvement of laboratory services.

Methods: We used participant data from the New York State Department of Health PT program to characterize the quality of testing in the toxicology specialty. Outcomes from laboratory investigations into causes of UNSAT and information on quality control practices collected from all program participants were used to identify the root causes of error.

Results: Two classes of error were encountered: spurious test results caused by lapses in standard operating procedures and instrument malfunctions (300 per million assays) and common-cause analytic error (7000 per million assays or 0.7% rate of UNSAT). Causes of spurious results included inaccurate mathematical correction for specimen dilution, misinterpretation of instrument codes, and instrument sampling errors. Calibration drift was most frequently cited as the common-cause analytic error. Approximately one-half of the laboratories used an allowable error for the quality control of analytical systems that exceeded the threshold error specified by manufacturers for stable instrument performance.

Conclusions: The causes of spurious results suggest the need for ongoing competency testing of analysts where analyst intervention is required in an otherwise automated process, and for continued diligence in mistake-proofing instrument design. The intrinsic quality of laboratory testing is unlikely to improve until the allowable error in quality control is consistent with manufacturer specifications for stable system performance.


   Introduction
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Quality improvement in the modern clinical laboratory environment entails the continuous inspection and refinement of processes to ensure the efficient delivery of services that meet the needs and expectations of those who use them. Proficiency testing (PT)1 is a point-sampling of laboratory output that is used to judge the quality of laboratory testing (1). Because laboratories release patient test results from processes that conform to their quality specifications, an evaluation of unsatisfactory performance by a PT provider is an unexpected outcome for the laboratory. The laboratory responds by conducting an investigation into the source of error and by modifying the procedure that produced the error, with the objective of reducing or eliminating chances of a recurring process failure.

The interlaboratory perspective of the PT provider affords opportunities to identify root causes of error that may be systemic among laboratories that use similar analytical systems or processes. Outcomes of investigations into reasons for PT failures can be used by the laboratory, by device manufacturers, and by the PT program itself in the continuous improvement of their respective products (2)(3)(4).

We used participant data from the New York State Department of Health (NYSDOH) PT program to characterize the quality of testing in the toxicology specialty. A frequent observation in PT is spurious results, not unlike those observed by Witte et al. (5) and Plebani and Carraro (6) in their reviews of clinical data, that are suggestive of laboratory mistakes rather than the product of common-cause analytic variation. Another observation is the constant rate of unsatisfactory performance across test events, which is suggestive of intrinsic analytic errors beyond those allowed by program performance specifications. We describe the root causes of unsatisfactory performance in the toxicology PT program.


   Materials and Methods
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
program design
The design of the NYSDOH toxicology PT program was described previously (7). Briefly, program staff purchase processed human serum and supplement the base with USP-grade drugs. Five lots of specimens are prepared for each of three test events that are conducted each year. The drug concentration over the five test specimens encompasses the reportable range that is typical of most analytical systems. The pools of test material are sterile-filtered and dispensed for shipment to laboratories. A laboratory that provides toxicology services to healthcare providers in New York State is required to participate in the PT program. Approximately 15% of the 380 program participants are located outside the state.

Analyte target values are established from either the weighed-in amount of drug or the robust estimate of the mean of participant data. Data from methods that are judged to require peer evaluation because of specimen matrix effects are removed before the determination of the participant mean. The participant mean is used as a target value if the mean differs by >3% from the gravimetrically assigned value.

Laboratory performance is judged by both CLIA ’88 (8) and NYSDOH evaluation criteria. We have proposed that the CLIA ’88 25% allowable error for most analytes in the toxicology specialty is inconsistent with the capabilities of current analytical systems and with the clinical requirements for optimal patient care (7). NYSDOH criteria (15% allowable error) are used to judge whether the laboratory needs to evaluate analytical performance for possible sources of error.

data review and analysis
We apply two levels of review to proficiency test reports. The first review occurs as reports are received to detect results that are so discrepant from target values, e.g., a toxic drug concentration reported as a subtherapeutic concentration, that the laboratory must immediately investigate the error and the possibility that similar errors occurred in testing patient specimens. We classify these aberrant results as spurious values, and the PT program coordinators and the laboratory typically complete the investigation within 24 h of notification. Findings from the investigation of spurious values were collected over the 10 test events conducted from January 1996 to January 1999 (Table 1 ).


View this table:
[in this window]
[in a new window]
 
Table 1. Outcome of investigations into PT spurious values.

The second review is an evaluation of performance against NYSDOH and CLIA ’88 criteria. Unsatisfactory performance is defined as two or more results, among the five challenges for an analyte, that exceed allowable error limits (analyte score <=60%). Our investigation into unsatisfactory performance is initiated by the mailing of an inquiry report to the laboratory. The investigation is limited to those cases in which performance is atypical of peer laboratories using the same analytical system, thereby substantiating the idea that the testing error(s) are laboratory based and not an artifact of the PT challenge (specimen matrix effects). The inquiry report restates laboratory performance for the analyte, quantifies the magnitude of the error, and provides PT program assessment of error as either systematic or random or a result of nonlinearity near the limits of the assay’s purported reportable range. The inquiry report is used to capture information on the design of the laboratory’s internal quality control (QC) program (source of QC materials, allowable imprecision, and the rules used for interpretation of QC data), the analytical CV at each level of QC, and the mean number of patient specimens analyzed each month. The laboratory is instructed to return the inquiry report with the internal documents that were generated in the process of its investigation into unsatisfactory performance. PT program staff use the documents to categorize the source of testing errors and to maintain a database of internal QC practices and assay performance characteristics.

categorization of test errors
Sources of test errors are categorized as follows: (a) calibration drift—performance in PT suggests significant systematic error and recalibration of the analytical system resolves the error; (b) method bias—performance in PT suggests significant systematic error, and we conclude that the inherent method bias contributed to the laboratory’s unsatisfactory performance; (c) reportable range errors—performance in PT suggests significant analytical bias near the limits of the reportable range for the method; (d) instability—performance in PT suggests random error, and the laboratory concludes that a component of the analytical system (e.g., sample probes, reaction cells, reagents) is not performing optimally; and (e) random event—the errors can not be replicated, and the investigation does not identify possible sources of error.

evaluation of internal qc practices as possible root cause of unsatisfactory performance
To evaluate whether internal QC practices are predictive of unsatisfactory performance in PT, we documented, from a survey of all program participants, the limits of acceptable results and the analytical CV for each QC material used by the laboratory, the source of QC materials, the rules used to interpret QC data, and the mean number of patient specimens analyzed each month. The allowable error used for the QC of an assay was determined from the ratio of the difference between QC limits and the QC range midpoint concentration (target) to the midpoint concentration, expressed as a percentage. The allowable errors used in the internal QC programs were compared to allowable error in PT and to manufacturer performance claims for the analytical system. Within intervals of allowable error, we also determined the incidence of unsatisfactory performance in PT attributable to analytic systematic error.

We characterized unsatisfactory analytical performance as systematic or random error through interpretation of two statistics, x-bar and range (1)(9), that are determined from the normalized bias of the five results that are reported for each analyte. The bias of the test result from the peer method mean is normalized to the PT program allowable error. For example, a laboratory reports a serum theophylline concentration of 13 mg/L, and the peer method mean is 10 mg/L. The allowable error around the target concentration is 15%. The normalized bias is determined as (13 mg/L - 10 mg/L)/(0.15 x 10 mg/L), or 2.0, that is, the reported result was two times the allowable error ascribed by the PT program. The x-bar statistic is determined as the mean of the normalized biases across the five challenge specimens for the analyte. The range statistic is an index of random error and is determined as the difference between the largest and smallest of the series of normalized results for an analyte, divided by 2. For example, the range statistic for the series of normalized results, 0.2, 0.5, -0.8, 0.6, and 0.1, is determined as 0.6 - (-0.8) = 1.4 ÷ 2 = 0.7. A range statistic equal to 0.7 indicates that the normalized results were distributed over a range equivalent to 70% of the full range allowed by the PT performance specification of ± 15%. A range statistic >0.7 suggests either significant random error or significant systematic error near the limit(s) of the purported reportable range.

The x-bar and range statistics were tabulated for each analyte for each of five test events conducted from June 1997 to September 1998. The largest of the series of five values of the respective statistics was likewise tabulated for each analyte. Cases of unsatisfactory performance in PT attributable to systematic error were identified as an analyte x-bar (maximum) >1.0 and range (maximum) <0.7.


   Results
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
investigation of spurious values
The outcome of investigations into spurious values is summarized in Table 1Up . A total 206 060 PT results (5 results for each analyte challenge) were reviewed from January 1996 to January 1999. We detected 106 spurious values (514 spurious results per million assays) over this period. Laboratory mistakes and instrument malfunction accounted for 58% of the causes of spurious PT results. Ten percent of the investigations were unsuccessful in the identification of a cause, and 32% of the spurious values were caused by inaccurate transcription of test results to PT report forms. The process used to report PT results is atypical of routine reporting of patient results. Although some laboratories continue to transcribe analytical results to patient report forms, we consider the transcription errors unique to the PT process and the errors are not addressed further.

We list in Table 1Up the laboratory mistakes and analytical system malfunctions that produced the spurious values and the frequency of occurrence in each test event. Laboratory mistakes occurred when valid analytical results were mishandled. The four assigned causes of mistakes were found to recur among laboratories and are identified as misinterpretation of instrument codes, inaccurate factoring for specimen dilution, mishandling of data provided on instrument printouts, and misidentification or misplacement of specimens within batch sequences.

Two of the recurring laboratory mistakes, misinterpretation of instrument codes and mishandling of data provided on instrument printouts, were instrument specific. The error rate among laboratories using the Beckman Synchron (Beckman Coulter Instruments) in the analysis of PT specimens for gentamicin and tobramycin at concentrations exceeding 12 mg/L (the method reportable range limit) was 33 333 per million assays (Table 1Up ). The analyzer correctly identified specimens with analyte concentrations outside its reportable range. However, the out of instrument range (OIR) annotation used to flag the specimen notifies the analyst that additional testing is required to determine whether the analyte concentration is low or high. In 9 of the 270 cases where the OIR code was presumably generated, the analysts interpreted the code to mean that the gentamicin and tobramycin concentrations were less than the lower limit of the reportable range (Table 2 ). The error rate across test events was proportional to the number of specimens that challenged the upper limit of the Synchron reportable range. Analysts in three laboratories misread the Abbott X-systems printout of assay data and reported net polarization units for drug concentration (Table 2 ). The columnar design of the instrument printout is used to list the net polarization, blank intensity, and analyte concentration for each specimen assayed. Analysts stated that the positioning of columnar data is not consistent between TDx, FLx, and AxSYM reports, which contributed to the misreading of instrument reports.


View this table:
[in this window]
[in a new window]
 
Table 2. Spurious PT results attributed to analytical system design.

Mishandling of specimen dilutions was the most common laboratory mistake and was responsible for 20% of the spurious values reported to the PT program (error rate was 510 per million assays; Table 1Up ). In each case, the analytical result obtained for the diluted specimen was accurate, but the reported results were grossly inaccurate (Table 2Up ). Laboratories committing this error have described four scenarios: the analyst diluted and assayed the specimen but failed to correct the result for dilution; the analyst diluted and assayed the specimen and corrected the result for dilution, but did not communicate that factoring for dilution was performed, and data entry staff repeated the correction for dilution before releasing the test result; the analyst was unaware that the instrument report listed the result that had been corrected for dilution; or an incorrect dilution factor was used in the determination of the specimen analyte concentration.

Instrument malfunction was cited in 15% of the investigations of spurious values (Table 1Up ), and most laboratories suspected the specimen-sampling module as the probable source of error. Typically, four of the five test results for an analyte challenge were well within PT program ranges of acceptable results, but the recovery of analyte from the fifth challenge specimen was markedly low. In most instances, transposition of specimens within the sequence batch was ruled out because the batch contained only the PT specimens. Repeat analysis of the test specimen upon request from the PT program invariably produced acceptable recovery of analyte. The error in sampling was most frequently detected by the PT program among users of the Abbott AxSYM (Table 2Up ). Laboratories suspected air bubbles in the sample, a hole in the reaction vessel, or a failure to aspirate the specimen. We estimate that the AxSYM sampling error rate is 0.016% (10 incidents per 60 575 assays, or 165 per million assays).

investigation of unsatisfactory assay performance
The outcome of investigations into unsatisfactory analytical performance is summarized in Table 3 . A total 20 830 analyte challenges (5 specimens per analyte challenge) were evaluated from the five test events conducted from May 1997 to October 1998. The rate of unsatisfactory performance among all analyte challenges was 0.7%. Seventy-five percent of the investigations concluded that the unsatisfactory performance was attributable to systematic error (calibration drift or bias near purported limits of reportable range) or random error (instability of analytical system). Conclusions from 25% of the investigations were indeterminate. However, program staff judged that in many cases in which an indeterminate conclusion was reached by the laboratory, an inherent method bias, not calibration drift, was a major component of the total error in test results. We categorized the sources of error from indeterminate laboratory conclusions as method bias (14%) or as a random, indeterminate event (11%).


View this table:
[in this window]
[in a new window]
 
Table 3. Apparent categories of analytical errors in unsatisfactory PT performance.

evaluation of internal qc practices
Calibration drift (error) was most frequently mentioned as the cause of unsatisfactory performance in PT (Table 3Up ). Because we expect that laboratories release results only from analytical runs that they judge to be in-control, we investigated the allowable errors used by laboratories in their internal QC programs and the analytical CVs. A scatter plot of allowable error vs method imprecision for theophylline, carbamazepine, and phenobarbital is shown in Fig. 1 A (data are for the control material with the drug concentration within its therapeutic range). The PT program performance specification for these analytes is ± 15% around the assigned target value; however, 35% of the laboratories reported that the allowable error used in their QC programs exceeded 15%. Only 10% of the laboratories reported that the 95% confidence interval of method imprecision (2 CV) exceeded 15%. These findings suggest that in many laboratories, the allowable error used to monitor analytical stability is decoupled from the imprecision performance characteristic of the analytical method. This inference is supported by Fig. 1A , where it appears that many laboratories use fixed criteria of 10%, 15%, and 20% for QC limits, where 20% is most prominent. The use of method standard deviation by many laboratories to set QC limits is also evident by the line-of-identity in Fig. 1A . We noted that ~4% of laboratories reported a method CV <2%, and 3% of the laboratories reported an allowable error of <4%. This observation raises questions concerning the statistical validity of imprecision estimates by this subset of laboratories. The low volume of patient testing may contribute to the questionable validity of imprecision estimates because 16% of the laboratories perform <10 analyses on patient specimens per month for theophylline, carbamazepine, and phenobarbital.



View larger version (37K):
[in this window]
[in a new window]
 
Figure 1. Scatterplots of allowable error used in the QC of analytical systems vs analytic imprecision (A) and PT analytic bias vs allowable error (B).

Data are for the control material with therapeutic concentrations of carbamazepine, phenobarbital, and theophylline. Intralaboratory imprecision is the 95% confidence interval for QC assays, calculated by multiplying by 2 the assay CV reported by the laboratory. The internal QC allowable error is determined as the ratio of the difference between QC limits and the QC range midpoint concentration (target), to the midpoint concentration, expressed as a percentage. The x-bar (max) (analytical bias) in scatterplot B is the largest x-bar in a series of five PT events. • indicates that the analytical bias was greater than the PT program 15% allowable error. A subset of laboratories reported the assay allowable error but not the assay CV. Points superimposed on the y-axis of panel A are from laboratories that did not report the method CV.

We evaluated the correlation of unsatisfactory performance in PT attributable to systematic error to the allowable error used in QC programs by plotting the laboratory x-bar (maximum), a measure of laboratory systematic error determined by the PT program, against the laboratory allowable error (Fig. 1BUp ). The performance on 14 analyte challenges was evaluated as unsatisfactory [x-bar (maximum) >1.0; range <0.7] with analytical bias exceeding 15%. The internal QC program allowable error exceeded 15% for 11 of the 14 cases of unsatisfactory performance. We further quantified the correlation by determining the rate of unsatisfactory performance within the intervals of allowable error (Table 4 ) and found that the rate of unsatisfactory performance exceeded 12% when the allowable error exceeded 22.5%.


View this table:
[in this window]
[in a new window]
 
Table 4. Distribution of laboratories and unsatisfactory PT performance within intervals of internal QC program allowable errors.

We noted a fourfold range in allowable error used by laboratories in the QC of phenobarbital, carbamazepine, and theophylline assays (5th and 95th percentiles of allowable error are 5% and 20%, respectively). The range in allowable errors suggests a lack of consensus on QC requirements and a lack of guidance from manufacturers on QC program parameters that are consistent with stable analyzer performance. We investigated QC practices and PT performance among laboratories that use an analytical system with QC program guidance that is provided by the manufacturer. Abbott Diagnostics (Abbott Laboratories) makes available QC materials and rules for interpretation of QC data to users of its X-systems. The QC limits for the phenobarbital, theophylline, and carbamazepine therapeutic control material are ± 10% for the TDx analyzer, and ± 12%, ±10%, and ± 15%, respectively, for the AxSYM analyzer around the assigned target value. Laboratories are instructed to initiate investigation of analytical performance when control assay values exceed the limits. We found that the performance of <6% of the Abbott analyzers is monitored by use of Abbott QC materials. Among those using the Abbott control materials, 70% of the assays are monitored with ± 10% or lower QC limits, and 97% of the QC limits are 15% or lower (Fig. 2 B). None of the performances on the PT analyte challenges among this group was judged unsatisfactory. When laboratories opted not to use Abbott QC materials, we noted that 49% of these laboratories used allowable errors in QC that are larger than the Abbott recommended fixed criterion to detect possible unstable system performance (Fig. 2A ). The incidence of unsatisfactory performance in PT increased with increases in QC program allowable error (Fig. 2C and Table 4Up ), supporting probability models developed by Ehrmeyer et al. (10) that predict the effects of method CV on PT outcomes.



View larger version (34K):
[in this window]
[in a new window]
 
Figure 2. Scatterplots of allowable error used in the QC of Abbott X-system instruments vs analytic imprecision, and X-system PT analytic bias vs allowable error.

Data are for the control material with therapeutic concentrations of carbamazepine, phenobarbital, and theophylline. Derivation of variables is as described in the legend for Fig. 1Up . Scatterplots A and C include data from all Abbott X-systems users; the data in scatterplot B are limited to laboratories using Abbott control materials. The x-bar (max) (analytical bias) in scatterplot C is the largest x-bar in a series of five PT events. • indicates that the analytical bias was greater than the PT program 15% allowable error. A subset of laboratories reported the assay allowable error but not the assay CV. Points superimposed on the y-axes of panels A and B are from laboratories that did not report the method CV.


   Discussion
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 
Our investigation into the causes of unsatisfactory performance in PT was performed at two levels: the analysis of episodic process failures that produced spurious results, and the analysis of systematic process control failures that produced results outside program performance specifications. Our findings are similar to those of Steindel et al. (2) when the errors are categorized as methodologic, technical, clerical, survey, and unexplained as was done in their Q-Probes study of PT failures (Table 5 ).


View this table:
[in this window]
[in a new window]
 
Table 5. Distribution of reasons for PT failures in four studies.a

Episodic process failures are caused by lapses in laboratory standard operating procedures (mistakes), and by performance anomalies in an otherwise stable analytical system (instrument malfunction). Clearly, diligence in the training and competency testing of staff to minimize mistakes is a vital and evolving activity within the laboratory. Laboratory efforts to reduce mistakes have been greatly assisted by advances in technology and design of analytical systems. The 1996 Clinical Chemistry Forum (11) focused on clinical laboratory mistakes and on concepts to design quality into analytical systems to reduce or eliminate mistakes. Manufacturers must anticipate the mistakes that are most likely to occur, and design mistake-proofing systems (12). Continual improvement can then be accomplished by complaint tracking after product launch (13). Our findings suggest opportunities for continual improvement in system design.

Each of the laboratory mistakes listed in Table 1Up has recurred across laboratories from the time the study was initiated. The most prevalent mistake occurs during the processing of specimens with analyte concentrations that exceed the upper limit of the analytical system’s reportable range. We design PT specimens to encompass the clinically relevant range of analyte concentrations. The analyte concentration in one of the five test specimens in each test event may exceed the upper limit of the reportable range of some methods. When we estimated the error rate for failure to perform mathematical calculation of results for specimen dilution, we assumed that all laboratories needed to dilute one specimen in each test event for each analyte. The estimate of 510 results not corrected for dilution per million dilutions performed may grossly underestimate the error rate, perhaps by as much as 10-fold if only 10% of the challenges required specimen dilution. Specimen dilution typically requires special handling and reporting, and typically, a breakdown in communication is the reason for the testing error. Some instruments allow the entry of a dilution factor, which is used by the instrument to calculate the reportable result obtained for a specimen that had been diluted off-line. The inconsistent use of this feature among analysts within a laboratory has produced confusion as to when the result requires manual calculation for dilution. The confusion is compounded when protocols for handling specimen dilutions vary among several different analytical systems within the laboratory. Automated specimen dilution and (or) instrument reports that describe the dilution protocol and the results would likely reduce this type of error considerably.

Rates of nonconformity are highly correlated with process complexity (12). Simplification of analytical devices for use in any setting has merit in the reduction of mistakes, and testing at the point of care has benefited greatly from manufacturer efforts to mistake-proof analytical devices. Although the error rate in transcription of results from Abbott X-systems reports was low in PT (Table 1Up ), the mistakes suggest an opportunity to improve the test process through simplification of instrument reports. The X-system reports contain an array of analytical data (fluorescence polarization units and blank intensity) that are associated with test specimen results, and the arrangement of the data varies among the different X-system configurations. Complex instrument reports only confound an inherently error-prone process of results transcription. Likewise, errors in the interpretation of instrument codes, as occurred among laboratories using the Beckman Synchron, could be eliminated with the consistent use of descriptive specimen flags on instrument reports.

The reason for spurious test results is difficult to identify when mistakes are eliminated as the cause. We were unable to assign a cause to 10% of the spurious proficiency test results, although the investigations were performed proximate to the episode. However, as data were collected over 3 years, a pattern emerged among laboratories using the Abbott AxSYM, where the recovery of analyte from one of the PT specimens was markedly low, whereas recovery from the remaining four PT specimens in the batch was well within the acceptable range of concentration. Laboratories suggested that the only plausible explanations were failure of an instrument to acquire the full volume of sample for analysis, air bubbles introduced into the specimen by mixing, or a hole in the reaction vessel used for the specimen. Clearly, given the unpredictable and low rate of occurrence (165 errors per million assays), it is difficult to isolate and identify definitively the cause of the spurious results. However, the plausibility of the explanations offered by laboratories suggests instrument features designed to prevent short sampling of specimens and to monitor the integrity of reaction vessels warrant consideration. Results from the Abbott AxSYM accounted for 29% of the entire PT database, which increased the likelihood of detecting such sampling errors for that analytical system.

One of us (R. Jenny) has conceptualized the use of an analytical system by many laboratories as a distributed production process (1). The process is sampled periodically by challenges with PT materials, and using principles of statistical process control, we can judge whether a laboratory is performing within specification. The NYSDOH performance specification for most toxicology analytes is the recovery of analyte from test specimens to within 15% of the target value. The specification is based on capabilities of modern analytical systems, and is consistent with standards of laboratory practice guidelines (14)(15) and with proposed analytic goals (16)(17).

The mean rate of nonconformance (unsatisfactory analyte performance) was 0.7% over five test events. Because laboratories release test results from analytical runs that are judged in-control, we investigated internal QC schemes used to monitor assay performance. We found wide disparity in the allowable errors used by laboratories in the QC of their analytical methods. We list in Table 6 the manufacturer estimates for imprecision for analytical systems most commonly used among participants in our PT program, the threshold imprecision that the manufacturer uses as a guideline to judge the performance of the analyzer, and the distribution of allowable errors among users of those systems. The manufacturer estimates for imprecision are those published in the "Performance Characteristics" section of the product assay sheets. The threshold imprecision used by a manufacturer’s technical services representatives to judge analyzer performance was obtained either from the product assay sheets or by consultation with the manufacturer. In many instances, the allowable errors are not consistent with manufacturer guidelines for stable performance of the analytical systems used. The allowable error used by ~50% of the laboratories using the Abbott X-systems exceeds the allowable error that is recommended by Abbott for the analysis of theophylline and phenobarbital.


View this table:
[in this window]
[in a new window]
 
Table 6. Manufacturer performance claims for analytical imprecision and allowable errors in routine testing.1

The scatter plot of allowable error against analytical imprecision (Fig. 2AUp ) reveals patterns that suggest the origin of error limits used for internal QC. The line-of-identity indicates a statistical derivation, the 20% fixed-limit is coincident with "expected" ranges provided by suppliers of assayed control materials (Bio-Rad and Dade are the major vendors of QC materials among laboratories we studied; Dade Liquid Immunoassay Controls and Bio-Rad Liquichek Immunoassay Plus Control materials are most frequently used.), and the 10% fixed-limit is recommended by Abbott to laboratories that use Abbott QC material. Laboratories that opt to use commercially assayed control ranges as QC limits frequently designate the limits of the supplied control ranges as 2 SD limits, and use "Westgard rules" (which presuppose valid estimates of method performance characteristics) to monitor method performance.

Clearly, when analytical systems are deployed for use, the quality of conformance to system design specifications varies considerably among laboratories that use those instruments, as expressed by the allowable errors they use to judge system stability. Because manufacturers continually refine technology to provide analytical performance that is consistent with clinical needs for optimal patient care, the rewards to patient care may be diminished by QC practices that are desensitized to unstable system performance. Steindel and Tetrault (18) and Howanitz et al. (19) conducted a Q-Probes study of QC practices in hospital laboratories and concluded that laboratorians have difficulty in following QC rules because they are complex and tedious to follow, and that QC practices should be simplified. We believe simplification should encompass the standardization of allowable errors among laboratories using an analytical system. The quality of patient testing is dictated by the analytical system design specifications: laboratories should not expect better performance than what is claimed but should not accept less. We suggest that the objective of the laboratory’s QC program should be to maintain system performance within verified manufacturer performance claims and that the manufacturer’s precision claims may be viable allowable error limits in QC programs. Until QC practices are made consistent and relevant to system design specifications, the intrinsic quality of laboratory services is unlikely to improve.

In conclusion, PT providers can substantially augment the utility of their services through the characterization of participant performance and active participation in investigations of causes of unsatisfactory performance. Guidelines have been developed for laboratories to review their PT results and to identify the source(s) of test error(s) (20)(21). The PT provider has the capability to assimilate causes of errors and to identify common causes among participants. The sharing of this information with laboratories and manufacturers should contribute to the continuous improvement of instrument design and laboratory services. Internal quality specifications, expressed as allowable error in QC, are an important link to providing reliable laboratory services that meet needs for good patient care. We observed an increased rate of unsatisfactory performance in PT as the allowable error in QC increased (Fig. 2Up ). Our finding that many laboratories use allowable errors for internal QC that greatly exceed (broad or narrow ranges) those recommended by manufacturers for monitoring system stability supports the contention that QC programs must be retooled. Laboratories make purchasing decisions based on system capabilities and should strive to maintain those capabilities (manufacturer claims) in routine use. To strive, through rigorous QC, for greater accuracy and precision than the analytical system was designed to provide is imprudent. To allow analytical systems to perform at a level that may be characterized by the manufacturer as unstable performance is simply unacceptable. QC limits based on these extremes produce high false-positive and false-negative rejection of analytical runs. The common ground is to base QC evaluation criteria on the expected performance of the analytical system. The manufacturer is in the best position to provide guidance for selection of those evaluation criteria.


   Footnotes
 
Laboratory for Molecular Diagnostics, Division of Molecular Medicine, Wadsworth Center, New York State Department of Health, P.O. Box 509, Albany, NY 12201-0509.

1 Nonstandard abbreviations: PT, proficiency testing; NYSDOH, New York State Department of Health; QC, quality control; and OIR, out of instrument range.


   References
Top
Abstract
Introduction
Materials and Methods
Results
Discussion
References
 

  1. Jenny RW. Process capability and stability of analytical systems assessed from proficiency testing data. Clin Chem 1994;40:723-728. [Abstract/Free Full Text]
  2. Steindel SJ, Howanitz PJ, Renner SW. Reasons for proficiency testing failures in clinical chemistry and blood gas analysis. Arch Pathol Lab Med 1996;120:1094-1101.
  3. Hoeltge GA, Duckworth JK. Review of proficiency testing performance of laboratories accredited by the College of American Pathologists. Arch Pathol Lab Med 1987;111:1011-1014. [ISI][Medline] [Order article via Infotrieve]
  4. Klee GG, Forsman RW. A user’s classification of problems identified by proficiency testing surveys. Arch Pathol Lab Med 1988;112:371-373. [ISI][Medline] [Order article via Infotrieve]
  5. Witte DL, Van Ness SA, Angstadt DS, Pennell BJ. Errors, mistakes, blunders, outliers, or unacceptable results: how many?. Clin Chem 1997;43:1352-1356. [Abstract/Free Full Text]
  6. Plebani M, Carraro P. Mistakes in a stat laboratory: types and frequency. Clin Chem 1997;43:1348-1351. [Abstract/Free Full Text]
  7. Jenny RW, Jackson KY. Evaluation of the rigor and appropriateness of CLIA ’88 toxicology proficiency testing standards. Clin Chem 1992;38:496-500. [Abstract/Free Full Text]
  8. US Department of Health and Human Services. Medicare, Medicaid, and CLIA programs; regulations implementing the Clinical Laboratory Improvement Amendments of 1988 (CLIA): final rule. Fed Regist 1992;57:7002–186..
  9. Montgomery DC. Introduction to statistical quality control, 2nd ed 1991:101-144 John Wiley & Sons New York. .
  10. Ehrmeyer SS, Laessig RH, Leinweber JE, Oryall JJ. 1990 Medicare/CLIA final rules for proficiency testing: minimum intralaboratory performance characteristics (CV and bias) needed to pass. Clin Chem 1990;36:1736-1740. [Abstract/Free Full Text]
  11. Garber CC, Witte DL. Quality for tomorrow: by design or by checking?. Clin Chem 1997;43:864-865. [Free Full Text]
  12. Hinckley CM. Defining the best quality-control systems by design and inspection. Clin Chem 1997;43:873-879. [Abstract/Free Full Text]
  13. Lasky FD, Boser RB. Designing in quality through design control: a manufacturer’s perspective. Clin Chem 1997;43:866-872. [Abstract/Free Full Text]
  14. Warner A, Privitera M, Bates D. Standards of laboratory practice: antiepileptic drug monitoring. Clin Chem 1998;44:1085-1095. [Abstract/Free Full Text]
  15. Pesce AJ, Rashkin M, Kotagal U. Standards of laboratory practice: theophylline and caffeine monitoring. Clin Chem 1998;44:1124-1128. [Abstract/Free Full Text]
  16. Jenny RW. Analytical goals for determinations of theophylline concentration in serum. Clin Chem 1991;37:154-158. [Abstract/Free Full Text]
  17. Fraser CG. Desirable standards of performance for therapeutic drug monitoring. Clin Chem 1987;33:387-389. [Abstract/Free Full Text]
  18. Steindel SJ, Tetrault G. Quality control practices for calcium, cholesterol, digoxin, and hemoglobin: a College of American Pathologists Q-probes study in 505 hospital laboratories. Arch Pathol Lab Med 1998;122:401-408. [ISI][Medline] [Order article via Infotrieve]
  19. Howanitz PJ, Tetrault GA, Steindel SJ. Clinical laboratory quality control: a costly process now out of control. Clin Chim Acta 1997;260:163-174. [ISI][Medline] [Order article via Infotrieve]
  20. Cembrowski GS, Engebretson MJ, Hackney JR, Carey RN. A systems approach to assure optimal proficiency testing in the hematology laboratory. Clin Lab Med 1993;13:973-985. [ISI][Medline] [Order article via Infotrieve]
  21. Cembrowski GS, Hackney JR, Carey N. The detection of problem analytes in a single proficiency test challenge in the absence of the Health Care Financing Administration rule violations. Arch Pathol Lab Med 1993;117:437-443. [ISI][Medline] [Order article via Infotrieve]



The following articles in journals at HighWire Press have cited this article:


Home page
Clin. Chem.Home page
P. Bonini, M. Plebani, F. Ceriotti, and F. Rubboli
Errors in Laboratory Medicine
Clin. Chem., May 1, 2002; 48(5): 691 - 698.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (12)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Jenny, R. W.
Right arrow Articles by Jackson-Tarentino, K. Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Jenny, R. W.
Right arrow Articles by Jackson-Tarentino, K. Y.
Related Collections
Right arrow Laboratory Management


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS