|
|
||||||||
Laboratory Management |
1
Central Laboratory, University Central Hospital of Turku, Kiinamyllynk 48, FIN-20520 Turku, Finland.
2
Departments of Clinical Chemistry and
3
Statistics, University of Turku, FIN-20500 Turku,
Finland.
a Author for correspondence. Fax 358-2-2613920;
| Abstract |
|---|
|
|
|---|
|
| Introduction |
|---|
|
|
|---|
Regression analysis has been used by several investigators for the estimation of age-dependent reference limits when the reference groups are so small that the resulting subsets are not large enough for the calculation of valid reference limits. For example, during early childhood the physiological changes occur over short time intervals, and it may be practically impossible to get a sufficient number of reference values to establish reference limits for each narrow age group.
Harris and Boyd (3) considered statistical criteria for partitioning data sets in a suitable manner to obtain separate reference limits. However, even after partitioning there may still exist dependency between analyte and, for example, age in some subgroups. In those situations the use of regression analysis has been suggested for the calculation of percentiles or prediction intervals to be considered as reference limits. However, the theoretical basis of reference intervals and prediction intervals is different. If reference limits are required, they must be estimated, i.e., parameters of the reference distribution are estimated. Conversely, prediction intervals are calculated for random variables, not parameters.
Irjala et al. (4) estimated reference limits for IgA, IgG,
and IgM in serum in children ages 6 months to 14 years with first- or
second-order regression models. If the variance of residuals was
age-dependent, then they calculated age-specific standard deviations,
which were used when age-related reference intervals were determined.
If there was no age dependency in variance of residuals, conventional
regression analysis was applied. In the study of Anderson et al.
(5) a weighted regression analysis was applied for the
determination of age-specific reference intervals for serum
prostate-specific antigen (PSA) because the variability of serum PSA
values increased with
age.1
Burritt et al. (6) defined age- and
sex-specific reference intervals for 19 biologic variables in serum
samples from children ages 1 to 22 years. They used polynomial
regression for age-dependent analytes and the 2.5% and 97.5%
percentiles constituted the reference interval. In the Table
included
in their publication they show reference intervals for several age
subgroups. Estimated limits at the center of each age interval were
considered as age-group-specific reference limits. They also used a
slight modification in older children to have a smooth transition to
adult concentrations.
Vicente et al. (7) established reference intervals for serum ferritin depending on age and sex. Data were analyzed within each gender separately and dependency on age was first examined by plotting ferritin values vs age. After visual inspection of data, subgrouping according to age was carried out, regression analysis performed as necessary, and 95% prediction intervals were used as reference intervals. Gallo et al. (8) presented a statistical method in which variables that affected reference intervals were considered. In this method variables such as sex, age, weight, and alcohol consumption with possible effect on serum urea were included in a regression model. The effects of age and sex and their interaction on cardiac enzymes was illustrated by regression analysis in the study of Kairisto et al. (9). The 95% age-specific prediction intervals were calculated and prediction limits were considered as reference limits. However, none of the above articles discussed the reliability of the derived age-specific reference limits.
Piecewise regression can be used after plotting the data, for example, on age to model it by linear regression. Gonchoroff et al. (10) used a piecewise linear regression model to estimate the reference interval for alkaline phosphatase in different age groups. Piecewise regression can also be applied in different cumulative age groups. From the different cumulative groups the best-fitting curves can be selected and the point(s) thereby found where the values begin to change to another direction.
Especially during the first months of life, many laboratory
analytesincluding hemoglobin (Hgb) and red blood cell indiceschange
rapidly. Fig. 1
shows this clearly for Hgb. It is obviously difficult to find
any one regression model for the whole age interval. Nonlinear
regression (i.e., nonlinear in the parameters) would be required. A
simpler method would probably be more useful for examining the various
age dependencies of different analytes. The visual inspection of plots
to judge the fit of data to the derived limits is an important step.
|
In this paper we use piecewise regression analysis to calculate 95%
age-specific reference intervals for Hgb in children ages 012 months.
Hgb concentrations are known to decrease quickly during the first
months of life, after which the concentrations start to increase again
(11). We decided to use the first year of life to describe
the use of linear piecewise regression in calculating age-specific
reference limits. The conventional reference limits are point estimates
from a reference sample group of the true limits, which could be
calculated if data were available from the whole reference population.
Hence the imprecision of the sample estimates must be considered by
determining the confidence intervals (CIs). In this paper we determine
the variance associated with reference limits and develop a Table
from
which the necessary values can be selected for the calculation of CIs
of reference limits produced by the regression method. An equation from
which approximate intervals can be calculated is also presented.
| Subjects and Methods |
|---|
|
|
|---|
The study protocol was officially accepted at the University Hospital of Turku and was in accordance with the Helsinki Declaration of 1975, as revised in 1983.
analytical methods
Blood specimens were obtained either by skin puncture or
venipuncture as part of routine examination of the children. The blood
was collected into microtubes containing EDTA as an anticoagulant.
Coulter Counter S-Series (S Plus VI and T-880, Coulter Electronic) or
Technicon H6000 (Technicon Instruments Corp.) analyzers were used for
the analysis. The Coulter Counter S Plus VI was used as a master
instrument and other analyzers were calibrated against it to produce
the equivalent results. For internal quality control, stable control
specimens and retained patient specimens were used.
| statistical methods |
|---|
|
|
|---|
![]() | (1) |
). The
regression-based upper reference limit for a specified age point
x0 is a quantile:
![]() | (2) |
is a fractile from standard
normal distribution, usually ±1.96. Even though the mean square error
(S) is an unbiased estimator of
for the regression model, the square root S
of S is not an unbiased estimator of
.
Because (n -
p)S/
is distributed
as
with n - p degrees of
freedomwhere n is sample size and p is number of
parametersthe following is obtained:
![]() | (3) |
(14). The exact value can be calculated by using gamma
function, but here the approximation is used. Thus the estimator for
the Eq. 2
![]() | (4) |
![]() |
0 is the fitted value at age point
x0 and
=
an-pS is an unbiased estimator of
. The variance of the estimator (Eq. 4
![]() | (5) |
![]() | (6) |
![]() |
is the
mean value of age and v = [1/n (x0-
¯X)/
i(xi-
¯X)]. The variance of
in the latter part of Eq. 5
![]() | (7) |
![]() |
![]() |
![]() | (8) |
is square of the fractile
of the normal distribution. The square root of above equation is the
standard error, which is needed when CIs are calculated. When
appropriate values (q) from Table 1
![]() | (9) |
![]() |
An approximate CI that is based on the asymptotic normal distribution
of
0 is:
![]() | (10) |
![]() |
' is a fractile from a normal
distribution corresponding to the appropriate confidence level. In the
Appendix an example and code of a SAS program to calculate
reference limits and CIs in piecewise linear regression are given. Because our main goal was to evaluate reference limits and their CIs, the percentile intervals calculated here are suitable. For other purposes tolerance intervals and prediction intervals may be used (15).
determination of the point of change
Because of the known change in age dependency of Hgb during the
first months of life, six cumulated age groups were formed
(01, 02, 03, 04,
05, and 06 months). The age interval, i.e., the used step, was
determined by calculating the optimal window width according to formula
(16):
![]() |
The point of change can be estimated, for example, by modeling the data in different cumulative age groups and by finding the best-fitting model, i.e., with greatest R (coefficient of determination) and minimal residual mean square. If the rest of the data fit well for linear regression, then the intersection of those regression lines is the point of change. If necessary, the extension to more than two piecewise regression lines is straightforward (13).
If the point where the slope of the regression model changes is known,
one can determine an indicator variable that takes this change point
into account and a piecewise regression model can be applied. The
regression equation is:
![]() | (11) |
aptness of regression model
It is important that the statistical procedure includes a check
for the regression. The regression models are based on the assumptions
that the distribution of residuals is normal and their variance is
constant (13). Hence it may be necessary to make a
logarithmic (or other) transformation of the original measurements to
fullfill these requirements. In this study the normality of residuals
was checked by the ShapiroWilk statistic, and graphic analysis of
residuals was done to provide information about constancy of residuals.
Examination of residuals is important because residual plots allow
deviations from linearity to be clearly seen. Formal testing for
linearity can also be done, but graphic analysis of residuals is
usually sufficient.
Diagnostic measures should be used for the detection of problematic observations, i.e., outliers and influential data points. Outliers and influential observations have no general definition, and their meaning varies from one author to another. In this study an observation was considered influential if it had a major influence on the fitted model, i.e., on the values of regression parameters or predicted values. We used Cook's influence statistic, high leverage points, dffits, and dfbetas statistics to detect influential observations (17)(18)(19). An observation was considered to be an outlier if its dependent variable was either much higher or lower than the dependent variable of other observations with similar independent variables. The studentized residuals were determined for that purpose (20).
All the statistical calculations were done by SAS®
for Windows 6.11 package (SAS Institute) and graphic presentation by
Microcal OriginTM for Windows (Microcal Software). Values
for Table 1
were calculated by MathematicaTM (Wolfram
Research).
| Results |
|---|
|
|
|---|
|
The ShapiroWilk statistic was used to evaluate the normality of the distribution of residuals. Outliers and influential data points were identified. A linear regression with the age group from newborns to 2 months was found to fulfill all the above criteria best. A second regression model was fitted to the rest of the data and the aptness of this model was also investigated by the methods described above. The intersection of these two lines occurred at 1.6 months, and this was selected to be the point where the slope of the piecewise linear model changed.
Finally, the whole age interval was modeled by a linear piecewise
regression method with a change point of 1.6 months. Residuals,
outliers, and influential data points also were estimated in this final
model. Five outlying observations with studentized residuals greater
than the critical value of the Table
(
= 0.10) derived by Lund
(20) were detected. However, these values were not
removed, as we did not have any specific clinical information that
would justify their removal. In Fig. 1
the 2.5% and 97.5% reference
limits and their CIs for Hgb values were fitted on the whole age
interval by the described methods. In Fig. 2
the age interval is extended only through the first 3 months to
better display the higher variability in newborns. Table 3
shows the numeric reference limits with corresponding CIs for
different ages and age groups. The exact limits and intervals for a
specified age point could be supplied by computer on the laboratory
report.
|
|
Conventional reference limits with CIs are shown in Table 4
. These were calculated by the Refval program (from H. E.
Solberg, Department of Clinical Chemistry, Rikshospitalet, Oslo,
Norway) after division of data into two age groups (01.6 months and
1.6 months1 year). Reference limits as reviewed by Tietz
(21) and limits calculated by our method are also shown in
that Table
.
|
| Discussion |
|---|
|
|
|---|
Determination of the minimum sample size when using regression is not straightforward because there are no power functions available for choosing sample size when reference limits are estimated by this approach. In this case the sample size, as well as the estimated variance, affect the width of the reference interval and CIs. A prerequisite for reference interval estimation is that the reference interval should remain stable, i.e., the width of reference interval should not be dependent on sample size. The method of linear models does not need a large sample size. Some suggestions about its application for reference interval estimation are presented by Royston (24). CIs become narrower as sample size increases. How precise CIs need to be (and, thus, the optimal sample size) must be estimated by the user. In reference interval estimation, the IFCC proposed that at least 120 observations are needed for reliable estimates (26). If nonparametric CIs are produced, this is indeed the minimum sample size.
In Fig. 3
, approximate and exact values have been fitted at different
degrees of freedom with the function of v. Here, as in Table 1
, v has
been chosen so that unreliable calculations are prevented, i.e., the
maximum value of v is 0.5. One can see that approximate values are near
the exact values when the sample size is as small as 20. The difference
between the values ranges from 0.06 to 0.08. Thus, if CIs for
regression-based reference limits are estimated, approximate values can
be used even for relatively small sample sizes. The width of the CIs
naturally depends on sample size. In our Hgb data the sample size is
310 and the estimated 97.5% reference limit with corresponding 95% CI
at 2 months of age is 136 (132140). If we assume that we
have 40 degrees of freedom (i.e., sample size is 43), the limit and
intervals would be 136 (129144). At this sample size (43)
the exact and approximate CIs are precisely the same. In our Hgb data
when the sample size is decreased from 310 to 43, the width of the CI
would increase about 7 g/L.
|
Thus it seems that a relatively small sample size is sufficient for the method. This is of great importance, e.g., when considering pediatric reference samples because of the difficulties in the collection of large reference sample groups. A distinct advantage in the use of piecewise regression analysis is that partitioning of the data into several subgroups becomes unnecessary and the sudden changes in reference limits at certain age limits can be avoided.
The estimate of the mean of Y is less precise when age is located farther away from its mean value. Thus CIs around the reference limits are wider the farther age is from its mean (13). However, this effect is only marginal when the estimation of limits and intervals is restricted to the same age interval from which the data originated. Because logarithmic transformation was applied, the upper and lower CIs are not equal. This occurs also in conventional reference interval determination upon retransformation of CIs to original scale.
Table 4
shows limits and intervals calculated by our method and by the
Refval program. The derived limits by both methods are close to each
other, especially when comparing children approaching the age of 1
year. The greatest differences in reference limits by regression and
conventional methods can be seen at ages at which Hgb values show steep
changes, i.e., in age groups 048 days. Further subgrouping with the
conventional method is not possible because too few reference subjects
would remain in each age group. Without considering the age effect, the
lower reference limit with CI would be 98 (87108) and
upper limit 224 (213234).
There are several ways to use regression analysis for the calculation of age-specific reference limits. Nonlinear regression models may be flexible but impractical to use especially over age periods such as newborns, puberty, or menopause, where some underlying physiological change may affect the age dependency. Separate regression models could be applied to specified age intervals, but this would introduce the problem of sudden changes in limits between different age groups (5)(27). The method used in this study prevents such difficulties. We still strongly recommend visual inspection of data in addition to statistical calculations and measures to check the appropriateness of the model.
The proposed method is practical and could be used for a variety of laboratory analytes that show dependency on age or other known characteristics. We recommend it be used in clinical laboratories to improve the quality of age-dependent reference limits. The necessary calculations and display of data can be done with microcomputers and basic statistical software.
| Appendix |
|---|
|
|
|---|
![]() |
0 and making rearragements gives
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
example
In our Hgb data after a logarithmic transformation was applied,
S=0.0100. If 97.5% reference limit and 95%
CIs are evaluated at 2 months of age, the following calculations are
needed:
![]() |
![]() |
0 = 0 1x0 -
2x, where 2 = 0
since the indicator variable has a value of zero. = (4.713 0.007x0)
= 4.717, which is the mean value of Hgb at given age point
= 0 ß1x0
Z
an-pS
= 4.913, which is the upper 95% reference limit at age 2 months.
CIu = (0
1x0
Z
an-pS)
Z
'an-p
[v
Z
(an-p
- 1)]S
= 4.940, which is the upper 95% CI for the upper reference limit. Value of v is 0.0135.
CI1 = (0
1x0
Z
an-pS) -
Z
'an-p
[v
Z
(an-p
- 1)]S
= 4.885, which is the lower 95% CI for the upper reference limit.
Because our sample size is >300, an approximate value can be used, and it is 1.96(1.0008)(0.1406) = 0.2758. Hence, after retransformation the 97.5% reference limit at 2 months of age with 95% CI is 136 (132140).
sas program
This program calculates 2.5% and 97.5% reference limits and 95%
CIs in piecewise linear regression.
options linesize=75 pagesize=56 nodate;
data limits;
n-310; /sample size/
nn=(1/n);
df=n-3; /degrees of freedom/
x1m=5.3384357048; /mean value of age in the sample/
x2m=-0.260042289; /mean value of indicator variable/
x11=0.0004267694; /value of x11 in inverse of X'X/
x12=-0.002225561; /value of x12 in inverse of X'X/
x21=-0.002225561; /value of x21 in inverse of X'X/
x22=0.024588304; /value of x22 in inverse of X'X/
s=0.10013; /root mean square error/
z=1.96; /fractile needed to calculate 95%/
z2=z2; /reference and confidence intervals/
b0=4.703196; /intercept term/
b1=0.007488; /point estimate ofß1 (age)/
b2=-0.347497; /point estimate of ß2 (indicator variable)/
do age=0.00 to 12.000 by 0.001; /age interval/
if age gt 1.571 then x1=0; /point of change in/
else x1=1; /our data is 1.571/
iv=(age-1.571)x1; /indicator variable/
diff1=age-x1m;
diff2=iv-x2m;
m1=(diff1x11)(diff2x21);
m2=(diff1x12)(diff2x22);
part1=m1diff1;
part2=m2diff2;
all=part1part2;
v=nnall
an=sqrt(df/(df-0.5));
an2=anan;
vza=v(z2(an2-1));
svza=sqrt(vza);
zas=zans;
pred=exp(b0(b1age)(b2iv));
Q2=(b0(b1age)(b2iv)(zas)); /upper ref. limit ln/
Q2a=exp(Q2); /upper ref. limit in original unit/
Q1=(b0(b1age)(b2iv)-(zas)); /lower ref. limit ln/
Q1a=exp(Q1); /lower ref. limit in original unit/
former=zassvza;
cilrl=exp(Q1-former); /lower confidence interval and/
cilru=exp(Q1former); /upper for lower reference limit/
ciurl=expQ2-former); /lower confidence interval and/
ciuru=exp(Q2former); /upper for upper reference limit/
output;
end;
run;
| Footnotes |
|---|
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
F. Ceriotti, R. Hinzmann, and M. Panteghini Reference intervals: the way forward Ann Clin Biochem, January 1, 2009; 46(1): 8 - 17. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Juonala, M. Kahonen, T. Laitinen, N. Hutri-Kahonen, E. Jokinen, L. Taittonen, M. Pietikainen, H. Helenius, J. S.A. Viikari, and O. T. Raitakari Effect of age and sex on carotid intima-media thickness, elasticity and brachial endothelial function in healthy adults: The Cardiovascular Risk in Young Finns Study Eur. Heart J., May 1, 2008; 29(9): 1198 - 1206. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Rauh and A. Koch Plasma N-Terminal Pro-B-Type Natriuretic Peptide Concentrations in a Control Population of Infants and Children Clin. Chem., September 1, 2003; 49(9): 1563 - 1564. [Full Text] [PDF] |
||||
![]() |
E. Wasen, P. Suominen, R. Isoaho, K. Mattila, A. Virtanen, S.-L. Kivela, and K. Irjala Serum Cystatin C as a Marker of Kidney Dysfunction in an Elderly Population Clin. Chem., July 1, 2002; 48(7): 1138 - 1140. [Full Text] [PDF] |
||||
![]() |
P. Suominen, A. Virtanen, M. Lehtonen-Veromaa, O. J. Heinonen, T. T. Salmi, M. Alanen, T. Mottonen, A. Rajamaki, and K. Irjala Regression-based Reference Limits for Serum Transferrin Receptor in Children 6 Months to 16 Years of Age Clin. Chem., May 1, 2001; 47(5): 935 - 937. [Full Text] [PDF] |
||||
![]() |
P. Suominen, K. Punnonen, A. Rajamaki, R. Majuri, V. Hanninen, and K. Irjala Automated Immunoturbidimetric Method for Measuring Serum Transferrin Receptor Clin. Chem., August 1, 1999; 45(8): 1302 - 1305. [Full Text] [PDF] |
||||
![]() |
D. Zurakowski, J. Di Canzio, and J. A. Majzoub Pediatric Reference Intervals for Serum Thyroxine, Triiodothyronine, Thyrotropin, and Free Thyroxine Clin. Chem., July 1, 1999; 45(7): 1087 - 1091. [Full Text] [PDF] |
||||
![]() |
E. M Wright and P. Royston Calculating reference intervals for laboratory measurements Statistical Methods in Medical Research, April 1, 1999; 8(2): 93 - 112. [Abstract] [PDF] |
||||
![]() |
A. Virtanen, V. Kairisto, and E. Uusipaikka Regression-based reference limits: determination of sufficient sample size Clin. Chem., November 1, 1998; 44(11): 2353 - 2358. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |