|
|
||||||||
General Clinical Chemistry |
1 Department of Medical Biochemistry, Rikshospitalet-Radiumhospitalet HF, Oslo, Norway.
2 Department of General Psychiatry, University Hospital of Northern Norway, Tromsø, Norway.
aAddress correspondence to this author at: Bergersletta 28, N-1349 Rykkinn, Norway. E-mail heesolbe{at}online.no.
| Abstract |
|---|
|
|
|---|
Methods: We studied the specificity of Horns test algorithm (probability of false detection of outliers), using Monte Carlo computer simulations performed on 13 types of probability distributions covering a wide range of positive and negative skewness. Distributions with 3% of the original observations replaced by random outliers were used to also examine the sensitivity of the test (probability of detection of true outliers). Three data transformations were used: the Box and Cox function (used in the original Horns test), the Manly exponential function, and the John and Draper modulus function.
Results: For many of the probability distributions, the specificity of Horns algorithm was rather poor compared with the theoretical expectation. The cause for such poor performance was at least partially related to remaining nongaussian kurtosis (peakedness). The sensitivity showed great variation, dependent on both the type of underlying distribution and the location of the outliers (upper and/or lower tail).
Conclusion: Although Horns algorithm undoubtedly is an improvement compared with older methods for outlier detection, reliable statistical identification of outliers in reference data remains a challenge.
| Introduction |
|---|
|
|
|---|
The simple Dixon range test (5), i.e., identify the extreme value as an outlier if the difference between the 2 highest (or lowest) values in the distribution exceeds one third of the range of all values, was proposed by the IFCC in the recommendation for statistical treatment of reference values (1) and included in previous versions of the RefVal program (6)(7). This method is reasonably insensitive to distribution type, but it has the major drawback of being unable to successfully handle clusters of 2 or more outliers.
A more promising method was proposed by Horn et al. (8). This method (hereafter referred to as Horns algorithm) is based on 2 general assumptions: (a) that the central part of the distribution contains most of the information of the genuine reference values; and (b) that outliers may be detected as values lying outside limits based on the properties of this central part. The algorithm operates in 2 steps: In the first step, the original data are transformed to approximate a gaussian distribution, to the extent this is possible in the presence of outliers. Horn et al. (8) used for this purpose the BoxCox function (9). In the second step, 2 detection limits (fences) are established based on the middle 50% of the transformed distribution, as suggested earlier by Tukey (10). Possible outliers are identified as the values located outside of these fences.
As we wanted to include Horns algorithm in the RefVal program (6)(7), which implements the IFCC-recommended methods for statistical treatment of reference values (1), we used Monte Carlo computer simulations in a study of the specificity and sensitivity of the algorithm:
| Specificity. |
|---|
|
|
|---|
| Sensitivity. |
|---|
|
|
|---|
| Materials and Methods |
|---|
|
|
|---|
2 distributions were produced (see below).
Asymmetric distributions with various degrees of skewness were produced from the computer-generated gaussian distributions. To cover the spectrum of asymmetric distributions that is typical for clinical biochemical reference data, the following nongaussian distributions were examined in this study: the square root gaussian distribution, the logarithmic gaussian distribution, and
2 distributions of df= 3, 4, 8, and 16. The square root gaussian and the logarithmic gaussian distributions were produced by use of the transformations x= g· g and x = exp(g), respectively, where x is a value of a transformed distribution and g a gaussian-distributed random value with mean = 100 and SD = 25.51. The
2 distributions with df degrees of freedom were produced by summing up the squares of standard gaussian values (gi): x =
gi2 (i = 1,..., df; df = 3, 4, 8, and 16). All of these asymmetric distributions were positively skewed. To obtain corresponding distributions with negative skewness, values of the generated distributions were transformed into those of mirror distributions: x'i = w xi, where w = xmin + xmax.
The details of these distributions are shown in Table 1
.
|
transformations
Mathematical functions can transform data of nongaussian distributions to approximate the theoretical gaussian distribution. Three functions of this kind were used in our study. One of these, the BoxCox transformation function (9), which was used in the original Horns algorithm for outlier detection (8), is as follows: y = (x
1)/
if
0; y = ln(x) if
= 0. The parameter
of this transformation was determined by maximizing likelihood [formula 8 in Ref. (9)]. The 2 other transformations considered in the present study are those of the 2-stage normalization procedure recommended by the IFCC for parametric estimation of reference limits (1). Manlys exponential function (11) corrects for nongaussian skewness: y = {exp(
· x) 1}/
if
0; y = x if
= 0. The John and Draper modulus function (12) rectifies remaining nongaussian kurtosis: z = sign[{(|y| + 1)
1}/
] if
0; z = sign[ln(|y| + 1)] if
= 0. [Here the sign (+ or ) is that associated with the value y, previously transformed by Manlys exponential function.] The 2 latter functions have always been part of the RefVal program (6). The function parameters (
and
, respectively) were determined by use of an iterative "brute force" method, guided by monitoring the coefficient of skewness (the exponential function) or the coefficient of kurtosis (the modulus function).
horns algorithm and its modifications
The algorithm described by Horn et al. (8) has the following consecutive steps:
We also studied 2 modifications of Horns algorithm. In the first step of the algorithm, we replaced the BoxCox transformation either with the exponential transformation or with a 2-stage transformation consisting of the exponential transformation followed by the modulus transformation.
simulation experiments
Pseudo-random data were generated for the distributions described above (Table 1
). All experiments presented here were based on distributions with n = 1000 values each. When studying the sensitivity of the outlier algorithm, we replaced 3% of the random values in the distributions by outliers, keeping the sample size fixed at 1000 values. Outliers were generated as uniform random values in the interval from 2.7 to 3.9 SD below and/or above the mean of the original gaussian distribution and then transformed (see section on generation of random data above).
We analyzed each data set by applying 2 (sensitivity study) or 3 (specificity study) of the versions of Horns algorithm for outlier detection that were presented above, the original one (using the BoxCox transforming function) and the 2 modifications (using the exponential transformation and the 2-stage transformation, respectively). The output monitored was the estimated transformation parameters, the coefficients of skewness and kurtosis of the transformed distributions, and the number of transformed values located below and above the lower and upper Tukey fences, respectively. The simulations were always iterated 6000 times for each distribution type.
statistical analysis
We computed the coefficients of skewness and kurtosis by applying established routines of the RefVal program (13). The output from simulation experiments was analyzed with Microsoft Excel.
| Results |
|---|
|
|
|---|
|
The observed percentages in Fig. 1
should be compared with the theoretical expected probability of false detection. For a gaussian distribution, the quartiles (Q1 and Q3) are located at a distance of 0.674 · SD on both sides of the mean, giving IQR = 2 · 0.674 = 1.349 · SD. According to the formulas given above, the Tukey fences are thus 0.674 + 1.5 · 1.349 = 2.698 · SD below and above the mean. The cumulated gaussian probability at 2.698 · SD is 0.0035. The expected frequency of false detection is thus 0.70%. This is shown as a horizontal line in each panel of Fig. 1
.
For the same simulation experiment, the remaining kurtosis after transformation by the BoxCox and exponential functions is shown in Fig. 2
. (The results for the 2-stage transformation are not shown because the coefficient of kurtosis necessarily always is zero.) The BoxCox transformation failed to make negatively skewed distributions symmetric (see the coefficients of skewness in the top panel of Fig. 2
).
|
sensitivity study
We studied the sensitivity of outlier detection in simulation experiments using probability distributions 410 described in Table 1
, each having 3% of the values replaced by random outliers. Three types of experiments, with different locations of the outliers, were performed: (a) all outliers placed in the interval (3.9 · SD to 2.7 · SD) of the lower tail of the distribution; (b) all outliers placed in the interval (2.7 · SD to 3.9 · SD) of the upper tail of the distribution; and (c) one half of the outliers, i.e., 1.5% of the observations, placed in each of these 2 intervals. The inner limits of these intervals, ±2.7 · SD, were set to coincide with the Tukey fences (see above). The cumulative gaussian probabilities at the limits 3.9 · SD and 2.7 · SD are 0.00005 and 0.0035, respectively, which shows that by setting the outer limits at ±3.9 · SD, the suggested intervals for outliers will cover practically all probability outside the inner limits. The expected total percentage of observations identified as outliers will now be 0.70% (false outliers) + 3.0% (true outliers) = 3.7%. Of this total percentage, 3.35% (in absolute terms) should originate from the tail in which the generated random outliers were placed and 0.35% from the other tail, as far as experiment types (a) and (b) are concerned, whereas for the experiment type (c), the expectation is 1.85% from each tail.
The mean percentages of data values located outside the 2 Tukey fences, i.e., values identified as outliers, true or false, are shown in Fig. 3
. The filled columns show results for the original Horns algorithm (using BoxCox transformation); the open columns show the corresponding results obtained for a modified algorithm (exponential transformation).
|
| Discussion |
|---|
|
|
|---|
specificity study
A low and predictable probability of false detection is a basic requirement of statistical tests for outliers. In our specificity study, we tested Horns algorithm and 2 modifications of it, using 13 different types of computer-generated probability distributions without generated outliers. These distributions were all unimodal, and they had a coefficient of skewness varying between 1.6 and 1.6 (Table 1
). The distribution types with zero or moderately large positive skewness (distributions 37) are typical of distributions found in clinical chemistry. Empirical distributions with negative skewness (distributions 813) are admittedly very rare in laboratory medicine, but because they may potentially occur, they were included in the study to make it comprehensive.
None of the outlier tests studied, neither the original Horns algorithm that uses BoxCox transformation nor the 2 modifications of this algorithm involved in the present study and based on other transformations, fulfilled the basic requirement for outlier tests stated above, as is shown in Fig. 1
. The theoretical expectation of 0.70% values falsely identified as outliers was obtained only when the distribution was gaussian (distribution 7) and, for the original Horns algorithm, using the
2 distribution with df = 8 (distribution 3). With Horns original algorithm (Fig. 1
, top panel), the probability of false identification was too high for distributions 46, which have the moderate positive skewness frequently found in medical data. It was particularly high for the logarithmic gaussian distribution (distribution 4), a very typical distribution in laboratory medicine. This test was very conservative for extreme positive skewness (distributions 1 and 2) and for all negatively skewed distributions (distributions 813). In contrast to this asymmetric behavior, the modified test, using the exponential transformation (middle panel of Fig. 1
), handled positively and negatively skewed distributions in the same, conservative way. When the modulus transformation was added for correction of remaining kurtosis (bottom panel of Fig. 1
), the percentage of false outliers was only slightly increased (0.77%0.96%) from the expected value of 0.7%, assuming that the skewness was moderate (positive or negative; distributions 36 and 811).
The main cause for the varying performance of the outlier tests based on the BoxCox and exponential functions was the remaining kurtosis after the transformation of data (Fig. 2
). A symmetric distribution with negative kurtosis has a flat, central peak and fewer values in the tails than does the gaussian distribution. Therefore, the percentage of values outside Tukey fences will be lower than expected when the distribution after transformation has negative kurtosis. Positive kurtosis has the opposite effect. Comparison of the 2 upper panels of Fig. 1
with the respective panels of Fig. 2
illustrates this kurtosis effect.
Another problem with the original Horns algorithm was that the BoxCox transformation failed to produce a symmetric distribution when the original distribution was negatively skewed (Fig. 2
, top panel; distributions 813). In such cases the algorithm will not handle values equally in the 2 tails of the distribution.
Horn et al. (8) correctly pointed out that automatic elimination of 0.70% of the reference values in a gaussian distribution may cause biased reference limits. They accordingly suggested to estimate a nominal 95% reference interval as a 95.67% interval. However, this recommendation is valid only if the transformation step of Horns algorithm gives a truly gaussian distribution. Our results show that this is not the case for the majority of the distributions studied here.
In summary, the specificity of neither the original Horns outlier test nor its modified versions will be predictable when analyzing empirical data because the performance of these tests is dependent on the underlying distribution type, which usually is unknown a priori.
sensitivity study
To get a manageable study of sensitivity for the outlier tests, we restricted it to the symmetric and moderately skewed distributions (distributions 410 in Table 1
). In addition, we omitted the 2-stage transformation, which uses the exponential and modulus functions in sequence. This might seem surprising because the specificity study showed that it had relatively stable performance for symmetric and moderately skewed distributions (Fig. 1
, bottom panel); however, the modulus transformation will necessarily corrupt the test in the presence of real outliers. Extra values in one or both tails of a distribution will increase the coefficient of kurtosis, but this is precisely what the modulus transformation attempts to correct. Test simulations (results not shown) confirmed that this was in fact the case.
When the underlying distribution was positively skewed or gaussian (distributions 47), the original Horns outlier test (Fig. 3
, filled columns) identified only slightly fewer values outside the Tukey fences than the expected total percentage (leftmost panels of Fig. 3
) if the outliers were located in the upper tail or both tails of the distribution (top and middle rows). However, in the case of the upper tail, approximately one third of these values were low false outliers (top row, middle panel), whereas a corresponding percentage of the true outliers located in the upper tail remained unidentified (top row, rightmost panel). The outlier test based on BoxCox transformation showed a rather poor performance when the outliers were located in the lower tail of the distribution (Fig. 3
, bottom row), and this was true for negatively skewed distributions (distributions 810) in particular.
We did not observe this kind of asymmetric behavior when we used the exponential transformation in a modified Horns algorithm (Fig. 3
, open columns). It underestimated somewhat the percentage of outliers when they were located in both tails (middle row of Fig. 3
). The sensitivity was unacceptably low when the outliers were located in 1 tail only (Fig. 3
, top and bottom rows).
conclusions
The results of our Monte Carlo simulation experiments concerning outlier detection based on the original Horns algorithm and 2 modifications of it were rather disappointing. In the specificity study, none of the outlier tests fulfilled the basic requirement of low and predictable probability of false detection. The sensitivity study suggested that the sensitivity tends generally to be too low. The main underlying problem seems to be that the calculation of Tukey fences in Horns algorithm assumes that the transformed distributions are close to gaussian in shape. Our results indicate that this is most often not the case, as judged from the coefficients of skewness and kurtosis after transformation, not even when outliers were absent (see, for example, the specificity study). The presence of true outliers only increases the problems with the transformations.
We assumed that the following modifications of Horns algorithm could possibly help to eliminate some of the negative effects of outliers on the transformation: (a) truncate the distribution to eliminate possible outliers by temporarily excluding, e.g., 5% of the extreme values at each tail; (b) then estimate the transformation parameter on the truncated distribution; and (c) finally transform all data, including the outliers, with this parameter and continue with steps 24 of Horns algorithm (see the Materials and Methods). However, test runs using this modification showed that the performance of Horns algorithm still was not acceptable (results not documented).
Horns algorithm for outlier detection is based on a promising idea (8) to determine outliers using criteria that are calculated from the central part of a hopefully close-to-gaussian distribution. However, our simulation experiments suggest that the normalization of distributions achieved by use of the transformation functions involved in the present study is not good enough to allow Horns algorithm to work as it is expected to do. Although Horns algorithm undoubtedly is an improvement compared with older methods for outlier detection, reliable statistical identification of outliers in reference data remains a challenge.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |