|
|
||||||||
Laboratory Management |
1 Research & Development and 2
Central Laboratory, Centro Diagnostico Italiano, Milan, Italy.
3 Lombardy Cancer Registry and Environmental Epidemiology, National Cancer Institute, Milan, Italy.
4 Department of Clinical Sciences Luigi Sacco, Università degli Studi di Milano, Milan, Italy.
aAddress correspondence to this author at: Dipartimento di Scienze Cliniche L. Sacco, Via G.B. Grassi 74, 20157 Milan, Italy. Fax 39-02-3564-018; e-mail carlo.franzini{at}unimi.it.
| Abstract |
|---|
|
|
|---|
Methods: Laboratory results over a 3-year period (
15 000 000 records; 197 350 individuals) were retrieved from our laboratory information system. An inclusion/exclusion procedure for individual patients was applied based on (a) presence of at least 1 of 23 previously defined "basic tests"; (b) only 1 measurement per test by the laboratory over the 3-year period; (c) for each test, absence of any abnormality in the correlated tests. Before the third step, correlations among quantities were assessed by a Spearman correlation matrix, comparing each of the 23 basic tests with all remaining tests by use of a novel multivariate algorithm.
Results: The initial sample group (n = 197 350) was reduced stepwise by the selection criteria outlined above to 166 027, then to 93 649, and finally to 61 246 individuals constituting our reference sample group. Results from the last 2 groups were used to calculate sex-specific, and in some cases age-related, reference limits for the 23 basic tests and for 13 additional quantities. Reference limits were calculated throughout this study by nonparametric estimation of percentiles.
Conclusion: Reference values derived by retrospective analysis of large samples of data obtained at a given institution are particularly suitable for the evaluation of results for the presenting patient population at that institution.
| Introduction |
|---|
|
|
|---|
The second difficulty involves comprehensively defining a reference population appropriately matching the specific patient(s) referred to the laboratory and extracting from such a population a numerically consistent sample group of individuals to be enrolled as "reference individuals". The establishment of a suitable reference population is essential (14) but difficult (8), and there are still cases in which unrepresentative reference populationsmedical students, hospital employees, blood donors or other volunteers, variably classified as "healthy"have been used. Attempts have been made to overcome these difficulties by using hospital or primary healthcare patients and applying different criteria in the detection of outlying data values and the identification of nondiseased individuals (15)(16)(17)(18).
The definition of reference values consistent with the underlying patient population and with the analytical methodology has been considered for fulfillment of Inter-national Organization for Standardization (ISO)1 accreditation requirements (19). Accurate definition of reference intervals is particularly important in laboratories, such as our diagnostic center, expecting to encounter a sizeable number of healthy individuals among the presenting patient population and/or a small deviation from normality of test results. This is the situation of our diagnostic center. We addressed this situation by developing an original approach, applicable to large databases, to select nondiseased individuals. Such individuals constitute the reference sample group for computing "internal" reference intervals, specific for the presenting patient population and for the analytical technology.
| Participants and Methods |
|---|
|
|
|---|
Approximately 15 000 000 records related to 197 350 individuals, stored in the laboratory information system (LIS) of CDI over a 3-year period (19971999), were retrieved to constitute the original database for this study (sample group 1). This sample group included 97 895 females and 99 455 males. For each test and for each sex, the number of available test results ranged from 3342 (IgA; females) to 120 256 [alanine aminotransferase (ALT); males], with the exception of folate, for which only 574 values (males) and 885 values (female) were available.
assay methods
All assays were performed on serum samples obtained from blood collected in plain glass test tubes, clotted, and centrifuged within 2 h of collection; stored at 4 °C; and assayed within 4 h. Measurements were performed with accepted and widely used analytical methods, implemented on automated analyzers. The following instruments were used: AU2700 (Olympus); Architect (Abbott); and Immulite 2000 (Medical Systems). For each instrument, reagents and calibrators were from the same manufacturer; instrument operation and calibration followed the instructions from specific manufacturers.
current reference intervals
The current reference intervals in use at CDI were derived mainly from manufacturer suggestions, in some cases modified according to literature data or practical experience. Neither the original population nor the statistical treatment of data were known in detail.
control of analytical quality
Daily internal quality control and participation in external quality assessment schemes (EQAS) were carried out according to the specifications of our certified quality system. Briefly, internal quality control was performed by daily assay of commercial lyophilized sera at 2 or 3 concentrations of the different quantities. Results were evaluated immediately by multiple decision rules for acceptance/rejection of the daily analytical series and for the assessment of any possible drift. Monthly means and SDs were computed and plotted to evaluate performance and medium- and long-term stability of the analytical systems. Participation in EQAS included several programs, according to the different quantities, organized either from the public health authority (mandatory EQAS of the Regione Lombardia) or from scientific associations and from companies. Results were compared with expected values (generally consensus means or system-specific participant means) to assess the maintenance of state-of-the-art analytical quality (trueness).
selection of nondiseased individuals
As a first approach, we selected 23 basic tests representing quantities of primary medical interest (Table 1
). Of the 197 350 individuals in the original sample group, 166 027 (78 955 females and 87 072 males) had had at least one of these basic tests performed once or more during the study period, and this group of 166 027 was our sample group 2.
|
|
The first selection criterion for identifying nondiseased individuals was the absence of repeated performance of a given test on the same person over the 3-year observation period, independent of the number of tests performed on that occasion. The inclusion criterion was the presence of a single laboratory measurement per test, so that persons with repeated measurements for a given test would not contribute in any subsequent computation concerning that test. This approach assumed that persons with repeated testing had a higher chance of being diseased.
At this stage, we applied the multivariate algorithm described below to select, for each of the 23 basic tests, a subgroup of individuals suitable for computing the relevant reference limits.
In a preliminary procedure for this selection step, the algorithm looked for the occurrence of any statistically significant correlation between each quantity and all others, using a Spearman correlation matrix based on the first test for each quantity in the observation period. The Spearman nonparametric linear correlation coefficient was used because some test results displayed a nongaussian distribution. Two quantities were considered correlated if the Spearman correlation coefficient was statistically significant (P <0.001). We then used the algorithm to scrutinize each test result for each person, examining all correlated test results obtained for that person on the same occasion. If all correlated test results available fell within the current reference intervals, then the test result under scrutiny was considered eligible as a reference value.
For example, suppose that
-glutamyltransferase (GGT) in males was shown to significantly correlate to conjugated bilirubin, ALT, and aspartate aminotransferase (AST); any single value of GGT was included in the set of the reference values for GGT only if the total bilirubin, ALT, and AST values measured in the same blood sample from the same person fell within the appropriate current reference intervals. Conversely, if 1 of the values of the 3 correlated quantities was not normal, then the GGT value was not included in the set of values selected for the production of the new reference interval. As another example, to stress the role of correlated tests in the algorithm, suppose on the contrary that a person had a abnormally high GGT but total bilirubin, ALT, and AST were all within the current reference intervals. In this case, the bad GGT value was retained for computation of a new reference interval, but the good bilirubin, ALT, and AST values would be omitted from reference interval computations. This event actually occurred in our data.
The algorithm is described in Table 1 of the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol51/issue7/.
The rationale of this approach is based on the fact that there are clusters of tests that, because they explore the same body function/system, are expected to have correlated results and usually are requested by the physician to be performed together in a given test subject. It is therefore very unusual to find a single quantity without its companion test quantities; thus, it is possible to build up a cross-validation scheme relevant to the vast majority of individuals. When this approach is used, the contribution of test subjects with a high number of test results belonging to an appropriate set is maximized. In the unusual case that the set of test results available at the same date for a given individual was such that no significant correlation was possible (e.g., low number of analytes or unusual combination of tests), these results would be classified according to the current reference intervals only.
The algorithm was applied separately for each sex, eventually leading to the operative definition of the reference sample groups for each basic test. The number of useful results in each sample group was variable, and these results are displayed in the second column of Table 1
.
The new reference limits for the ith test of the set of 23 basic tests were defined as 2.5%97.5% nonparametric percentiles of the distribution in the sample group for that specific test, computed by the nonparametric rank-based method (6). Alternatively, we chose to define only the upper or the lower reference limit when the expected variations of the quantity were only (or mainly) in one direction (increase or decrease), thereby generating a markedly skewed distribution on the relevant side of the frequency histogram. In these cases, the 95% or 5% nonparametric percentile was chosen as the upper or lower limit, respectively. For these quantities (e.g., the 2 aminotransferases), the accurate definition of the alternative limit has no medical relevance. The current reference limit, either lower or upper, was therefore maintained.
The next step was the selection of the persons who had a single occasion of laboratory measurement for any of the 23 basic tests during the 3-year period. The number of persons satisfying this criterion was 93 649 (41 576 females and 52 073 males), and this group constituted our sample group 3.
The application of the described multistep selection procedure to the 23 basic tests led to the definition of a sample group (sample group 4; n = 61 246) for whom the available results for the 23 basic tests all fell within the newly defined reference limits, that is, were nondiseased according to these multiple criteria. The availability of measurement results of additional quantities in this sample group allowed reference intervals to be defined for 13 additional quantities (Table 2
).
|
| Results |
|---|
|
|
|---|
The flow chart of the described multistep selection procedure is shown in Fig. 1
. As shown, the stepwise application of different exclusion/inclusion criteria progressively restricted the initial sample group of 197 350 persons to a final sample group (sample group 4) of 61 246 nondiseased individuals (29 080 females and 32 166 males). Reduction in sample size was accompanied by narrowing of the reference interval, with particular lowering of the upper limit for those quantities characterized by prevailing increase in disease. A few examples of the effect of the algorithm in reducing reference intervals included ferritin in females (from 4.0478 to 8.0257 µg/L), glucose in females <48 years (from 4.449.82 to 4.055.88 mmol/L), C-reactive protein (CRP) in males (from 069 to 035 mg/L), and prostate-specific antigen (PSA) in males >45 years (from 08.9 to 03.7 µg/L).
|
The frequency distribution by age and sex of individuals in sample group 4 is shown in Fig. 2
. The descriptive statistics of the 36 quantities measured in the individuals (females and males) belonging to this sample group are shown in Tables 3 and 4 of the online Data Supplement. The median number of tests per person in sample group 4 was 5 for both males and females and for sample group 1 was 5 for males and 6 for females.
|
New and current reference limits for females and males are compared in Tables 1
(23 basic tests) and 2
(13 additional quantities). For some quantities, age had a marked effect on values; we therefore computed the reference intervals for specified age groups for glucose (<48 years and >48 years), alkaline phosphatase (>16 years), creatinine (>11 years), PSA (>45 years), and transferrin (females >50 years and males >30 years).
For 23 tests, the new reference intervals were narrower than or similar to the current reference intervals, whereas for the remaining 13 tests, the intervals were wider. Among the latter, ferritin, iron, transferrin saturation, LDL and total cholesterol, triglycerides, glucose >48 years, GGT, CRP, PSA, amylase, and creatine kinase showed moderate to marked shifts of the upper limit.
Total cholesterol and LDL-cholesterol had widened reference intervals because of the remarkable increase of the upper limit for both men and women; triglycerides had a lower upper limit only in women. Concerning the lipid quantities, however, variations in the upper reference limits reflect the comparison of the true reference values (new limits) with desirable values (current limits).
| Discussion |
|---|
|
|
|---|
Large databases stored in laboratory medicine centers offer opportunities for producing internal reference intervals. Because a consistent proportion of symptom-free well individuals present to our center for health assessment, we developed a method to use the large pool of data generated in our center to estimate reference intervals, based on rules designed to progressively exclude nonhealthy individuals, according to the a posteriori approach (3).
The exclusion rule of more than one access per test to the laboratory during the data collection period has been applied in previous studies (15)(16), and its rationale has been explained in the Participants and Methods section. The rationale for the next exclusion rule (any abnormality in tests previously shown to be significantly correlated to the test under scrutiny) has also been briefly outlined: it is based on the observation that multiple test abnormalities are more likely to be significant than single test abnormalities (21). Application of this rule excluded approximately one-third of the individuals who satisfied the previous inclusion/exclusion criteria. It is possible that persons actually free of disease were excluded on the basis of weak correlations among many quantities because of significant correlation attributable to the large sample size. However, this event is preferable to the opposite error: the inclusion of diseased persons, potentially causing a shift in the reference limits. The number of tests performed for most individuals was between 4 and 8: in the latter case, the chance of having one or more abnormal test results increases to 34% (21). A similar but less sophisticated approach was followed in the definition of reference values for 5 serum enzymes measured by reference procedures (17).
By applying this selection cascade, we eventually reduced the 197 350 individuals in the original sample to 61 246 individuals, a large sample group of individuals closely matching the typical person presenting to our center for testing. Thus, the reference limits for many quantities were calculated on sample sizes of several thousands of test results.
Inclusion/exclusion of individuals in our study was based on laboratory data only. This limitation was balanced by the large amount of available data, which allowed the production of sex-specific reference intervals. Comparison of the whole reference group with the subgroup of individuals from occupational and preventive medicine programs at our center (data not shown) did not reveal substantial differences, thereby confirming that the multivariate algorithm was of adequate power to identify nonhealthy individuals. The small differences sometimes observed were likely related to the fact that these observations came from a smaller sample with a younger mean age and higher employment rate. The so-called healthy worker effect (the tendency for actively employed people to have a more favorable mortality expectancy than the general population) (22) may have influenced the comparison between the overall group and the persons referred for initial or periodic health assessment.
As recommended (6), the reference intervals were taken as the 95% central nonparametric portion of the reference distribution. For some specific quantities, exceptions to this rule have been applied and explained. Use of the 99.8% central portion, particularly when the aim of testing is the identification of wellness, was suggested recently (23). However, a reference interval for the absence of disease would require "absolutely normal" reference subjects (23), with criteria including genetic normality as determined by assessment. With the main exception of folate, in our study the sample size for each quantity was large enough to allow safe application of the usual rank-based technique for percentile estimation (6)(24)(25)(26).
At the CDI laboratory, the analytical methods were carefully chosen and monitored to guarantee state-of-the-art reliability of results. Nevertheless, it must be stressed that in principle our reference values are valid only for the stated analytical procedures/systems. The transferability of such reference values to other laboratories would imply the absence of any significant between-laboratory analytical bias (27), which is a prerequisite for the production and use of common reference intervals (28)(29).
The large number of reference individuals selected with our procedure, of both sexes and spanning a large interval of age (Fig. 2
), permitted their separation into subgroups according to sex and age. In the present study we report sex-specific intervals for all quantities and age-related reference intervals for only a few quantities for which the patients age has a major impact on the computed interval and on medical interpretation. However, the reference values for most quantities appeared to be influenced by age, confirming the medical utility of age-related reference intervals (30), which will be considered in a future study. Criteria for partitioning both gaussian-distributed (31) and nongaussian-distributed (32) reference values into subgroups have been suggested.
In conclusion, we have developed a new procedure for producing reference values by retrospective analysis of large samples of data collected over a period of 3 years in our laboratory. This study highlights the possibility of formulating internal reference values for a given institution based on the population it serves. These reference values are particularly suitable for evaluating the results of patient presenting to that institution. The only requisites are access to a sufficient amount of data in the LIS and the support of a biostatistician.
| Footnotes |
|---|
-glutamyltransferase; AST, aspartate aminotransferase; CRP, C-reactive protein; and PSA, prostate-specific antigen. | References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
D. J. Herrera, K. Morris, C. Johnston, and P. Griffiths Automated assay for plasma D-lactate by enzymatic spectrophotometric analysis with sample blank correction Ann Clin Biochem, March 1, 2008; 45(2): 177 - 183. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Giavarina, R. M. Dorizzi, and G. Soffiati Indirect Methods for Reference Intervals Based on Current Data Clin. Chem., February 1, 2006; 52(2): 335 - 337. [Full Text] [PDF] |
||||
![]() |
E. Grossi, R. Colombo, S. Cavuto, and C. Franzini Indirect Methods for Reference Intervals Based on Current Data: The authors of the article cited above respond: Clin. Chem., February 1, 2006; 52(2): 337 - 338. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |