Clinical Chemistry
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Clinical Chemistry 46: 867-869, 2000;
This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (21)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Linnet, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Linnet, K.
Related Collections
Right arrow Laboratory Management
(Clinical Chemistry. 2000;46:867-869.)
© 2000 American Association for Clinical Chemistry, Inc.


Technical Briefs

Nonparametric Estimation of Reference Intervals by Simple and Bootstrap-based Procedures

Kristian Linnet1

1 Laboratory of Clinical Biochemistry, Psychiatric University Hospital, Skovagervej 2, DK-8240 Risskov, Denmark

In recent years, increasing interest has arisen in nonparametric estimation of reference intervals. The IFCC recommendation focuses on the nonparametric procedure, and the NCCLS guideline on reference interval estimation deals exclusively with the nonparametric approach (1)(2). The mentioned reports are based on the simple nonparametric approach, taking as a basis the sorted sample values. In addition to this basic approach, modern computer-based procedures have been introduced, which have made it possible to attain slightly increased precision for the nonparametric approach by applying resampling methods, weighted percentile estimation, or smoothing techniques (3)(4). In the present report, both the simple nonparametric reference interval estimation procedure and the resampling (bootstrap) principle were studied using simulations based on distribution types that should be relevant for clinical chemistry, i.e., gaussian and skewed distributions.

According to the procedure recommended by the IFCC and NCCLS, the observations are ranked according to size, and the 2.5 and 97.5 percentiles are obtained as the 0.025 (n + 1) and 0.975 (n + 1) ordered observations (1)(2). If the estimated rank values are not integers, then linear interpolation is carried out. In the statistical literature, various modifications of the computation procedure have been considered (5)(6)(7). Here the traditional one used in clinical chemistry as outlined above (called method I) is compared with an alternative (called method II): p/100 x n + 0.5, where p indicates the percentile (6). For the 2.5 and 97.5 percentiles, method II yields the 0.025n + 0.5 and 0.975n + 0.5 ordered values, respectively. In the following, the above-mentioned calculation principles are referred to as "simple" procedures (IS or IIS) as opposed to "bootstrap" modifications described below (IB or IIB).

The bootstrap principle consists of repeated random resampling of the original observations with replacement, which is performed by a computer (8)(9). Each of the original observations is assigned the same probability of being resampled, i.e., 1/n. For each set of n resampled values, percentile estimates are computed as usual. After repetition of the procedure a large number of times, e.g., 50–500 times, the bootstrap estimates are obtained as the means of these percentile estimates. In the present study, 100 replications were carried out.

A gaussian population distribution was considered as a basis (Fig. 1 , top left panel). A mean (µ) of zero and a standard deviation ({sigma}) of 1 were selected as parameters, i.e., corresponding to the standard gaussian distribution. The true 2.5 and 97.5 percentile values for a gaussian distribution are µ ± 1.96 x {sigma}, respectively, i.e., -1.96 and +1.96 for the standard gaussian distribution.



View larger version (22K):
[in this window]
[in a new window]
 
Figure 1. Gaussian distribution (top left panel) and skewed distribution with coefficient of skewness of 1.5 (top right panel), with the 2.5 and 97.5 percentile limits demarcated, and RMSEs of the estimated 97.5 percentile as a function of sample size for both distributions (bottom panels).

(Bottom left panel), RMSEs for the gaussian distribution; (bottom right panel), RMSEs for the skewed distribution. Based on 5000–10 000 simulation runs.

The skewed distribution was generated on the basis of standard gaussian distributed values subjected to the inverse of the Manly exponential transformation {y = [log (1 + ky)]/k} with a selected parameter producing a coefficient of skewness of 1.5 (Fig. 1Up , top right panel) (10)(11). This degree of skewness may, for example, be observed for reference value distributions of serum concentrations of enzymes (11).

The root mean squared error (RMSE) of percentile estimates represents an overall measure of bias and imprecision for a given procedure and allows a ranking of the studied procedures. The bottom left panel of Fig. 1Up displays a comparison of the RMSE values of simple and bootstrap modifications of methods I and II with regard to the upper percentile for the gaussian distribution (same as for the lower percentile). The IIS procedure clearly outperforms the IS method, and the bootstrap version of method II has the lowest RMSE of all procedures for all sample sizes. The same ranking of the procedures is valid for estimation of the upper percentile of the skewed distribution (Fig. 1Up , bottom right panel). In general, the differences between the procedures are most pronounced at low to moderate sample sizes. At a sample size of 40, the RMSE of the bootstrap version of method II is 30% (gaussian distribution) or 42% (skewed distribution) lower than that of the IS procedure. The differences are 8% and 7% at a sample size of 500.

A valid statistical estimation method should provide a realistic estimate of the uncertainty associated with the procedure, e.g., expressed as a 90% confidence interval. For the simple estimation procedures (IS and IIS), a 90% confidence interval for the reference interval limits may be derived from the sorted sample values when n is at least 120 (1). Simulations showed that for both the gaussian and the skewed distributions, the actual coverage was 91–93%, i.e., in reasonable agreement with the expected value of 90%. For the bootstrap modifications, 90% confidence intervals can also be estimated at lower sample sizes on the basis of the distribution of individual bootstrap percentile estimates, from which the SE is estimated. At the low sample size of 40, a too-low coverage of ~70% was provided, but at higher sample sizes, the coverage was in fairly good agreement with the expected value of 90%. The low coverage at n = 40 relies in a tendency of the bootstrap procedure to underestimate the real uncertainty at small sample sizes.

The general, approximate relationship between SE and sample size is that of a square root relationship: to halve the SE, a fourfold increase in sample size is required. Table 1 displays the relationship between the expected SE and sample size for the IIS and IIB procedures with regard to percentile estimation for gaussian (columns 2 and 3) and skewed distributions (columns 4 and 5), in the latter case with regard to the upper limit. The SE of the percentile estimate is presented here as a percentage of the width of the reference interval to make the relationship generally applicable. The approximate 90% confidence interval is obtained as ± 1.65 SE around the percentile estimate. For a sample size of 120, the 90% confidence interval corresponds to approximately ± 10% around the percentile in case of a gaussian distribution, and approximately ± 25% for the skewed distribution (the percentage of the width of the reference interval). The bootstrap version provides SEs that are 4–12.5% lower than those of the simple version, corresponding to sample size savings of 8–24%. In relation to parametric estimation (xm ± 1.96 SD), procedure IIB has efficiencies of 48.7% and 57.5% for the gaussian and the skewed distributions, respectively (evaluated at n = 100).


View this table:
[in this window]
[in a new window]
 
Table 1. Relationship between sample size and expected SE of percentile estimates obtained by simple (IIS) and bootstrap (IIB) procedures.

A theoretical treatment of nonparametric percentile estimation shows that the detailed percentile computation formula depends on the type of distribution being considered (7). For a gaussian distribution, the expression (p/100) (n + 0.2) + 0.4 has been recommended for the percentile p (7). This corresponds to the (0.025n + 0.4) and (0.975n + 0.6) ordered values for the 2.5 and 97.5 percentile, respectively. These expressions are very close to the IIS formula, which in some preliminary simulations turned out to perform slightly better and thus was brought into focus in the present study. The above-mentioned expressions have actually been considered for use in clinical chemistry, but they have not received much attention (12)(13).

Overall, procedure IIB provided the lowest RMSE of percentiles for both the gaussian and the skewed distributions at all studied sample sizes. This was also confirmed for other types of distributions that might be of relevance, e.g., the log-normal distribution, the skew model of Box and Cox, and symmetric distributions with kurtosis deviating from that of the gaussian distribution (11). However, at small sample sizes, the estimated confidence interval has a low coverage. Thus, it is advisable to apply the bootstrap procedure mainly at sample sizes exceeding 100. In addition, the high general degree of uncertainty also suggests that a sample size of at least 100 should be considered for nonparametric reference interval estimation.

The bootstrap procedure is related to the weighted percentile method suggested by Harrell and Davis (14). Percentiles are estimated as a weighted average of all possible percentiles, which may reduce the RMSE by ~10–15% for a sample size of 119 (3)(14).

The present study shows that irrespective of what type of nonparametric procedure is used, nonparametric reference interval estimation at small-to-moderate sample sizes is associated with a large degree of uncertainty. A minimum sample size of 120 for nonparametric reference interval estimation has been suggested previously with reference to the lower limit for specification of the 90% confidence intervals of the percentile limits on basis of the sorted sample values. At this sample size, the width of the 90% confidence interval is likely to be <20% of the length of the reference interval, given a symmetric distribution, but for skewed distributions, the percentage is larger. For sample sizes exceeding 100, the bootstrap procedure, preferably in the IIB version, can be recommended, and the improvement in efficiency is likely to correspond to sample size savings of 10–15%. The bootstrap procedure for reference interval estimation is available in the RefVal program (IB version) distributed by Solberg (15), and in the CBstat program (IB and IIB versions), which is a Windows program distributed (free) by the author (16).


Footnotes

fax 45-89773549, e-mail linnet{at}post7.tele.dk


References

  1. . International Federation of Clinical Chemistry. Approved recommendation (1987) on the theory of reference values. Part 5. Statistical treatment of collected reference values. Determination of reference limits. J Clin Chem Clin Biochem 1987;25:645-656. [Web of Science]
  2. . National Committee for Clinical Laboratory Standards. How to define and determine reference intervals in the clinical laboratory; approved guideline 1995:1-59 NCCLS NCCLS Document C28-A (ISBN 1-56238-269-1). Villanova, PA. .
  3. Shultz EK, Willard KE, Rich SS, Connelly DP, Critchfield GC. Improved reference interval estimation. Clin Chem 1985;31:1974-1978. [Abstract]
  4. Harris EK, Boyd JC. Statistical bases of reference values in laboratory medicine 1995:23-39 Marcel Dekker New York. .
  5. Snedecor GW, Cochran WG. Statistical methods, 6th ed 1967:125 Iowa State University Press Ames, IA. .
  6. Lentner C, ed. Units of measurements, body fluids, composition of the body, nutrition, 8th ed. Geigy scientific tables, Vol. 1. Basel, Switzerland: Ciba-Geigy, 1981;55:99..
  7. David HA. Order statistics 1981:80-82 Wiley New York. .
  8. Diaconis P, Efron B. Computer-intensive methods in statistics. Sci Am 1983;248:96-108.
  9. Efron B, Tibshirani R. Statistical data analysis in the computer age. Science 1991;253:390-395. [Abstract/Free Full Text]
  10. Manly BFJ. Exponential data transformations. Statistician 1976;25:37-42.
  11. Linnet K. Two-stage transformation systems for normalization of reference distributions evaluated. Clin Chem 1987;33:381-386. [Abstract/Free Full Text]
  12. Rossing RG, Hatcher WE. A computer program for estimation of reference percentile values in laboratory data. Comput Programs Biomed 1979;9:69-74. [Web of Science][Medline] [Order article via Infotrieve]
  13. Solberg HE, Grasbeck R. Reference values. Adv Clin Chem 1989;27:1-79. [Web of Science][Medline] [Order article via Infotrieve]
  14. Harrell FE, Davis CE. A new distribution-free quantile estimator. Biometrika 1982;69:635-640. [Abstract/Free Full Text]
  15. Solberg HE. RefVal: a program implementing the recommendations of the International Federation of Clinical Chemistry on the statistical treatment of reference values. Comput Methods Programs Biomed 1995;48:247-256. [Web of Science][Medline] [Order article via Infotrieve]
  16. Linnet K. CBstat: a program for statistical analysis in clinical biochemistry 1999:1-53 K Linnet Reference manual. Risskov, Denmark. .



The following articles in journals at HighWire Press have cited this article:


Home page
J Wildl DisHome page
B. R. Beechler, A. E. Jolles, and V. O. Ezenwa
EVALUATION OF HEMATOLOGIC VALUES IN FREE-RANGING AFRICAN BUFFALO (SYNCERUS CAFFER)
J. Wildl. Dis., January 1, 2009; 45(1): 57 - 66.
[Abstract] [Full Text] [PDF]


Home page
J Wildl DisHome page
A. Reiss, T. Portas, and A. Horsup
HEMATOLOGIC AND SERUM BIOCHEMICAL REFERENCE VALUES FOR FREE-RANGING NORTHERN HAIRY-NOSED WOMBATS
J. Wildl. Dis., January 1, 2008; 44(1): 65 - 70.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
E. Grossi, R. Colombo, S. Cavuto, and C. Franzini
The REALAB Project: A New Method for the Formulation of Reference Intervals Based on Current Data
Clin. Chem., July 1, 2005; 51(7): 1232 - 1240.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
A. Lahti, P. Hyltoft Petersen, J. C. Boyd, P. Rustad, P. Laake, and H. E. Solberg
Partitioning of Nongaussian-Distributed Biochemical Reference Data into Subgroups
Clin. Chem., May 1, 2004; 50(5): 891 - 900.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
K. Linnet and M. Kondratovich
Partly Nonparametric Approach for Determining the Limit of Detection
Clin. Chem., April 1, 2004; 50(4): 732 - 740.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (21)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Linnet, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Linnet, K.
Related Collections
Right arrow Laboratory Management


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS