|
|
||||||||
Laboratory Management |
Departments of
1
Mathematical Sciences and
2
Pathology and Laboratory Medicine, University of Cincinnati, Cincinnati, OH 45221.
a Address correspondence to this author at: Department of Mathematical Sciences, University of Cincinnati, PO Box 210025, Cincinnati, OH 45221-0025. Fax 513-556-3417; e-mail paul.horn{at}uc.edu.
| Abstract |
|---|
|
|
|---|
2 distributions of 1,
4, 7, and 10 df.
2 distributions were chosen
because they simulate the skewness of distributions often found in
clinical chemistry populations. We used the root mean square error as
the measure of performance and used computer simulation to calculate
this measure. The robust estimator showed the best performance for
small sample sizes. As the sample size increased, the performance
values converged. The robust method for calculating upper reference
interval values yields reasonable results. In two examples using real
data for haptoglobin and glucose, the robust estimator provides
slightly smaller upper reference limits than the other procedures.
Lastly, the robust estimator was compared with the other procedures in
a population where 5% of the values were multiplied by a factor of 5.
The reference intervals were calculated with and without outlier
detection. In this case, the robust approach consistently yielded upper
reference interval values that were closer to those of the true
underlying distributions. We propose that robust statistical analysis
can be of great use for determinations of reference intervals from
limited or possibly unreliable data. | Introduction |
|---|
|
|
|---|
| current approaches |
|---|
|
|
|---|
![]() |
![]() |
,ß) is the incomplete beta
function and x(1)
...
x(n) are the observed order statistics. The Harrel
and Davis estimator has been recommended as the nonparametric method of
choice for the derivation of reference intervals (3).
Therefore, it is this version of the nonparametric method that will be
examined in this study.
The second approach to deriving reference intervals is based on
transforming the data to achieve normality, computing the appropriate
quantile estimators using normal theory, and back-transforming to
the original scale. The transformation used in this study is described
by Harris and Boyd (3). Briefly, an initial transformation
removes skewness:
![]() |
,
, is computed from the original x-data. If
||
|| < 0.10, then the log(x c)
transformation is used, and
, the maximum likelihood
estimator of c, is then computed.
Once the initial transformation is fit, a second transformation is
derived to remove any remaining kurtosis. The y values in
the previous equation are standardized to have zero mean and unit
variance. Then a constant, K, is determined so that:
![]() |
![]() | (1) |
and sz are the sample
mean and SD of the z-data, and z(1 -
/2)
is the appropriate standard normal quantile. Therefore, for a 90%
reference interval,
= 0.10, and z(0.95) = 1.645 are
used. These two estimates are then back-transformed to the
y-data scale and, finally, the original x-data
scale. On the other hand, if normality is rejected, then the
nonparametric (Harrel and Davis) reference interval is used.
We end this section by noting that the reference interval can be viewed
as a prediction interval based on the random sample
X1,... ,Xn
for the next observation, Xn1. If
the underlying population is normal, then the random variable:
![]() |
) 100%
reference interval is equal to:
![]() | (2) |
/2)
is the appropriate quantile from a Student's t-distribution
with (n - 1) df.
Clearly, for large samples the reference intervals defined in Eqs. 1
, and 2
are approximately equal. The 9095% reference intervals defined
by Eq. 2
are ~8% wider than those defined by Eq. 1
for
n = 20 and only ~1% wider for n =
100. However, we will use the reference interval defined by Eq. 1
on
the transformed data, even for small samples, because it is more
prevalent in the clinical chemistry literature. We did examine the
reference interval based on Eq. 2
, as well as the interval based on the
uniform minimum variance unbiased estimators of the quantiles. Neither
of these performed well enough to replace the interval defined by Eq. 1
for this study.
| robust prediction intervals and quantile estimators |
|---|
|
|
|---|
) 100%
prediction interval for the next observation,
Xn 1 given an observed random
sample X1 = x1,...
,Xn = xn has the form given by
Eq. 2
![]() | (3) |
, the estimated center of the interval.
Horn (7) proposed a prediction interval replacing the
three estimates,
, s, and
s/n, by robust estimates of location
and scale. Specifically, the (1 -
) 100% biweight prediction
interval for symmetric populations is defined as follows:
![]() | (4) |
![]() | (5) |
![]() |
![]() |
![]() |
![]() |
(u) may be rewritten
as
(u) = u ·w(u), where w(·) is a weight
function. Making this substitution in Eq. 5
i[xi
w(ui)]/
w(ui).
Thus, Tbi is a weighted mean with weights that
decrease as ui goes from 0 to ±1; equivalently, as
an observation xi goes from the center
Tbi to Tbi ± c ·
s, its weight decreases. If an observation is more than
c · s from the center
Tbi, it gets weight zero. For example, if
c = 6 and s = s, the sample
SD, then observations >6 SD from the center get weight zero.
Equation 5
defines a class of different estimators; each estimator is
the solution based on a specific
function. For example, if
(u) = u, all observations get equal
weight, and the solution is T =
.
However, in the case of biweight
(·), the
solution Tbi is computed iteratively, starting with
the sample median. The iteration is necessary because
Tbi is a weighted mean with weights that depend on
(a previously computed) Tbi.
A popular class of estimators of spread is based on variance estimates
of robust estimators of location. For estimators based on Eq. 5
, the
asymptotic variance is simply
E(
)/[E(
')],
where E(·) denotes mathematical expectation. The variance
estimate,
S
2, simply replaces
mathematical expectation with empirical averaging. For the biweight
function, we use
ST2(c) to denote this
estimate of Var[Tbi(c)]
based on the tuning constant, c. Because this variance
estimate is essentially a standard error squared, it goes to 0 by order
n. Thus, a reasonable estimate of spread is
Sbi(c) =
times the square root of variance
estimate of Tbi(c).
The actual value used for sbi in the iteration of
Tbi is slightly different from that given above. We
follow the modified formula given by Kafadar (8).
Specifically, sbi is computed by using the biweight
function,
(·), but with the
sample median used for location and MAD/0.6745 (the median absolute
deviation about the median) used as an estimate for scale. (The factor
of 0.6745 is included so that MAD/0.6745 is consistent for
in the
gaussian case.) The sbi used in Eq. 4
is computed in
the same manner; the only difference is the value of the tuning
constant. For details, see Horn (7) and Kafadar
(8). Simple SAS code, which can be modified for most
languages, is included in the Appendix to this report.
The tuning constant c1 is set equal to 3.7,
which means that, for the purposes of location estimation, observations
are down-weighted (smoothly) the further they lie from the center
(i.e., the current value of Tbi in the iteration
procedure). Any observations that are more than ~3.7 SD from the
center get zero weight. The tuning constant c2,
on the other hand, is a function of the value of the prediction
interval (1 -
). Specifically, c2 =
[0.58173 - 0.607227(1 -
)]-1 for
0.05
0.5. Thus, for 90% reference intervals,
= 0.10
and c2 = 28.4, and for 95% reference intervals,
= 0.10 and c2 = 205.4 (7).
We intend to examine the performance of this robust prediction interval after the BoxCox transformation to symmetry. Because it was designed to accommodate possibly heavy-tailed distributions, the power transform to remove any residual kurtosis is not required.
Another candidate for a robust reference interval uses the robust
quantile estimator for skewed populations as its upper endpoint
(9). This quantile estimator is based on the robust
prediction described above. The idea is to examine only data points
greater than the sample median. Then a symmetric pseudo-sample is
created by including all data points greater than the sample median
and their pseudo-values that are equidistant less
than the median. For example, if n = 20, and the data
are ordered x1 < ... <
x20, then the median, M =
(x10 x11)/2 and the
symmetric pseudo-sample, is:
![]() |
![]() |
The analogous lower robust quantile is not used, because in most cases the underlying populations are positively skewed, and thus the median will be greater than the mode. Reflected pseudo-samples in these cases, although symmetric, will be indicative of underlying bimodal populations (9). Thus, for the lower endpoint, we will use the nonparametric estimator (Harrel and Davis) because it also does not require transformation of the data.
| simulation and assessment |
|---|
|
|
|---|
distributions with df 1, 4, 7, and 10, respectively. The
usual measure of performance is the root mean square error (RMSE) for each of the endpoints that constitute the
reference interval. Specifically, the RMSE of the upper endpoints of a
particular (1 -
) 100% reference interval, for example, is as
follows:
![]() |
-1(1 -
/2) is the
estimate of the true endpoint (quantile)
F-1(1 -
/2). This value is estimated
via simulation by:
![]() |
i-1 (1 -
/2) is
the endpoint of the reference range derived from the
ith random sample, and N is the
number of random samples in the simulation; here N =
1000.
The RMSEs of the lower and upper endpoints of 90% and 95% reference
intervals are given in Tables
1 and
2, respectively. For the upper endpoint of 90% reference
intervals, the robust quantile estimator (untransformed) achieves the
smallest RMSE, especially for smaller sample sizes and the more skewed
populations (fewer df). However, it is essentially equal in
performance to the transformed traditional procedure for
n
40. For the lower endpoint, the RMSEs of the two
transformed procedures (traditional and robust) are about equal and
slightly better than the RMSE of the nonparametric, which is also used
by the untransformed robust procedure. For 95% reference intervals,
however, the robust procedure is clearly best, especially for the
smaller sample sizes (n
40). For the larger
sample sizes, the transformed procedures again are about equal,
with the robust slightly better for the more skewed populations and the
traditional (normal theory) procedure slightly better for the more
symmetric populations (more df). For the lower endpoints,
the transformed procedures again are about equal and only slightly
(510%) better than the nonparametric procedure.
Traditionally, assessment of reference interval limits has focused on the RMSE of the interval endpoints as described above. This certainly makes sense if the interval endpoints are used as targets for treatment. For example, suppose the endpoint of the 95% reference interval for creatine kinase for middle-aged women is 192 U/L, as derived by a particular laboratory. Physicians who use this laboratory may evaluate their patients who have concentrations in excess of this value to determine the cause. In this case, clearly, the value of the endpoint of the reference interval itself is vital, and its accuracy (RMSE) is vital for assessment of a procedure.
On the other hand, the reference interval is designed to include (or
exclude) a specified percentage of the underlying population. It could
be argued that, in fact, it is this percentage that should be
evaluated. Specifically, we will now consider the RMSE of the
percentage as estimated by the lower and upper endpoints of the
reference interval. Here, the RMSE of the upper probabilities, for
example, is as follows:
![]() |
![]() |
i-1(1
-
/2)] is the actual (unknown, in practice) proportion of the
population less than the upper limit of the reference interval from the
ith simulated random sample.
The RMSE for the lower and upper probabilities of 90% and 95%
reference interval limits are not presented here because all procedures
achieved roughly the same RMSE, although that of the robust upper
probability limit was slightly smaller for n = 20. One
particularly interesting fact is that the transformed robust procedure,
which appeared to perform poorly (especially for small samples) as an
upper endpoint estimator, should do so well in terms of the RMSE of the
probability. This phenomenon may be explained by the first few terms of
the Taylor expansion of the mean square error (MSE) of the probability.
Specifically, if we expand MSE
{F[
-1(p)]}
(p = 1 -
/2 for brevity) about the true
quantile F-1, we get the following:
![]() |
If we examine only the first (nonzero) term of the Taylor expansion, we see that the MSE (and thus the RMSE) for the probability is proportional to that of its corresponding endpoint. However, the second term shows that the upper limit estimators, which are positively skewed with respect to the true quantile, will benefit in terms of MSE for the probability. This is because, in general, f'[F-1(p)] < 0 for the upper limits. (These results are not surprising because the probability contained between an upper quantile and a one-unit shift to the right is less than that of a one-unit shift to the left.)
Although rewarding upper limits that tend to be skewed toward larger
values may be the conservative thing to do statistically (e.g., if we
state "p% < x", we want at least p%), it
could be disastrous in the context of a medical reference interval,
where large values of the analyte in question are indicative of a
possibly adverse health condition. To equalize the MSE loss, we
introduce a factor to be multiplied by the difference
{F[
i-1(1
-
/2)] - (1 -
/2)} before squaring for those samples
where
[
i-1(1 -
/2)] > 1 -
/2. We will use as this factor the
ratio of probabilities to the left and right of the upper quantile.
Thus, for the upper limit, this factor on the true difference in
probabilities will be (1 -
/2)/(
/2), when the probability
contained by the upper limit of the reference interval exceeds the
nominal, target value. The same factor is used for lower limits when
their true probabilities are less than the target value.
The weighted RMSEs of the probabilities for 90% reference intervals,
where the above factors premultiply the differences before the squaring
operation, are presented in Table 3
. Essentially, all of the procedures are equivalent, with a
slight edge going to the robust approach for the most skewed population
and to the transformed traditional approach for the others. One fact to
note is that the transformed robust is worst for n =
20, as it was for the interval endpoints. The results for 95%
reference intervals are provided in Table 4
. In this situation, however, the robust procedure for the upper
limit is clearly superior in every case. Of particular interest is that
the robust method does very well compared with the other methods for
larger sample sizes. This indicates that the robust upper limit is a
reasonable procedure for large as well as small samples.
|
|
| examples |
|---|
|
|
|---|
)sx2/2N]1/2,
where sx is the sample SD of the transformed
data, N is the sample size,
defines the
quantiles of interest, and ß defines the confidence level
of the interval for each of the point estimators (10). In
our case,
= 0.025 and ß = 0.90. The confidence intervals for the other methods were derived by using the bootstrap methodology (2). Here, 200 samples were drawn with replacement (i.e., resampled from the observed data), yielding 200 reference intervals for each methodology. From these values, the observed 5th and 95th quantiles were used as a 90% confidence interval.
The results for the haptoglobin data are given in the top of Table 5
. All of the methods are reasonably consistent. The transformed
methods have a lower quantile estimator about two units larger than
that of the nonparametric. The confidence interval for the upper
endpoint based on the bootstrapped robust method is ~1% tighter than
that based on the transformation approach.
|
As a second example, we compute similar statistics for blood glucose
concentrations (mmol/L) in samples obtained in our laboratory from 46
men,
80 years of age. The data are as follows:
3.520 3.905 4.070 4.070 4.290 4.345 4.400 4.455 4.565
4.620 4.620 4.675 4.840 4.840 4.895 4.895 4.950 4.950
5.115 5.115 5.225 5.225 5.225 5.335 5.335 5.390 5.390
5.390 5.455 5.555 5.610 5.665 5.720 5.775 5.830 5.830
5.885 5.885 6.215 7.095 7.205 8.140 9.900 10.890 11.605
12.045
The results are given in the bottom of Table 5
. We note
that, in this case, no suitable transformation to normality was found;
therefore, only the nonparametric and robust procedures appear. From
Table 5
, we see again that the upper quantile estimator provided by the
robust procedure is tighter than that of the nonparametric. Note
that the confidence intervals of both upper quantile
estimators lie entirely in the range defined as diabetic (>7.7
mmol/L or 1.4 g/L) by the American Diabetic Association
(11).
| outlying observations |
|---|
|
|
|---|
60. All of the
methods "broke down", however, in the sense that their RMSEs were
at least an order of magnitude larger than those with uncontaminated
data (Tables 1
|
|
|
The use of outlier detection is not routinely recommended for reference
interval analysis because large values from a skewed population may be
mislabeled as outliers (11). However, for completeness,
Table 7
presents results based on the same data as Table 6
but with a
simple outlier detection method on the original data; any value >3.5
SD away from the mean is ignored. One thing is clear from Table 7
the
drastic improvement of all the methods. Nevertheless, the robust method
maintains its superiority in virtually every situation.
|
Although not presented here, results for the weighted RMSE of the upper probabilities do not contradict the above results. Without outlier detection, all methods are essentially the same, with the robust method having a slight advantage for the most skewed population. With outlier detection, all methods improve, but the robust method becomes clearly the best in every situation.
| Discussion |
|---|
|
|
|---|
n
60, clearly exists. We show by the simulation
study presented here that the RMSE calculated by the robust quantile
estimator was the smallest for upper endpoints calculated on small
sample sizes. However, when evaluating the upper and lower
probabilities for the 90% and 95% reference intervals, we showed that
the losses in over- vs underestimating should not be treated
symmetrically. To equalize the MSE loss, a weighting factor was
introduced. In this case, the robust statistic was superior for
estimating the upper limit of the 95% reference interval and about
equal to the transformed traditional interval for estimating the 90%
reference interval. When real serum haptoglobin data were examined in
this fashion, the robust estimator of the 97.5 percentile limit was
smaller than that of the nonparametric estimator and comparable with
the estimator based on transformation. A second example, using glucose
data from 46 elderly men, showed a similar result. Thus, it is
reasonable to propose that robust estimators can provide relevant
reference intervals when only small numbers of samples are available.
Furthermore, if it is suspected that outliers may exist, then the
robust method should do as well as, if not better than, other methods,
whether or not outlier detection is used. However, because none of the
procedures did particularly well when confronted with severe
contamination, we cannot overstate the importance of ensuring the
quality of the data and the data collection process when determining
reference intervals. In summation, we recommend that nonparametric, robust, and normal theory (on transformed data) reference intervals be computed in practice. If the methods are in agreement, then any one will do reasonably well for reporting purposes. However, if the methods disagree, then we believe that the tightest interval should be used. The reason we recommend this is that, given the choice between reasonable, though disparate, reference intervals, we would prefer to err on the side of more false positives, rather than false negatives, thus forcing the clinician to further evaluate the patient. Finally, if the sample size is so small that it precludes reasonable nonparametric confidence intervals for the limits, or if a suitable transformation to achieve normality is not possible, then the proposed robust method should be used, at least for the upper endpoint of the reference interval.
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
E. Schwedhelm, V. Xanthakis, R. Maas, L. M. Sullivan, F. Schulze, U. Riederer, R. A. Benndorf, R. H. Boger, and R. S. Vasan Asymmetric Dimethylarginine Reference Intervals Determined with Liquid Chromatography-Tandem Mass Spectrometry: Results from the Framingham Offspring Cohort Clin. Chem., August 1, 2009; 55(8): 1539 - 1545. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Ceriotti, R. Hinzmann, and M. Panteghini Reference intervals: the way forward Ann Clin Biochem, January 1, 2009; 46(1): 8 - 17. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Jund, M. Rabilloud, M. Wallon, and R. Ecochard Methods to Estimate the Optimal Threshold for Normally or Log-Normally Distributed Biological Tests Med Decis Making, July 1, 2005; 25(4): 406 - 415. [Abstract] [PDF] |
||||
![]() |
H. M. Blanck, B. A. Bowman, G. R. Cooper, G. L. Myers, and D. T. Miller Laboratory Issues: Use of Nutritional Biomarkers J. Nutr., March 1, 2003; 133(3): 888S - 894. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S. Horn and A. J. Pesce Effect of Ethnicity on Reference Intervals Clin. Chem., October 1, 2002; 48(10): 1802 - 1804. [Full Text] [PDF] |
||||
![]() |
P. S. Horn, L. Feng, Y. Li, and A. J. Pesce Effect of Outliers and Nonhealthy Individuals on Reference Interval Estimation Clin. Chem., December 1, 2001; 47(12): 2137 - 2145. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. S. Horn, A. J. Pesce, and B. E. Copeland Reference Interval Computation Using Robust vs Parametric and Nonparametric Analyses Clin. Chem., December 1, 1999; 45(12): 2284 - 2285. [Full Text] [PDF] |
||||
![]() |
E. M Wright and P. Royston Calculating reference intervals for laboratory measurements Statistical Methods in Medical Research, April 1, 1999; 8(2): 93 - 112. [Abstract] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |