Clinical Chemistry Email Content Delivery
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
 QUICK SEARCH:   [advanced]


     


Electronic Letters to:

Letters to the Editor:
Douglas G. Altman and J. Martin Bland
Commentary on Quantifying Agreement between Two Methods of Measurement
Clin Chem 2002; 48: 801-802 [Full text] [PDF]
*eLetters: Submit a response to this article

Electronic letters published:

[Read eLetter] Quantifying Full Agreement
Bruce E. Siskowski   (3 July 2004)
[Read eLetter] Testing agreement of two methods over a range
Bruce Siskowski   (3 July 2004)

Quantifying Full Agreement 3 July 2004
 Next eLetter Top
Bruce E. Siskowski,
Director of Engineering
Reichert Inc.

Send letter to journal:
Re: Quantifying Full Agreement

bsiskowski{at}reichert.com Bruce E. Siskowski

Bland and Altman admit the importance of including repeated measurements in an comparing performance; but they fail to make recommendations on how to do this. Accordingly, this response will discuss some helpful options.

Backround: A paired t test has widely been recommended to compare mean performance for paired data (1 point for each of 2 devices for each treatment 1, 2, ..., n); but no discussion is made on comparing within-treatment repeatability or a changing bias that can go undetected with a paired t test. Also little exists in texts for cases when more than one replicate or repeat data point exists per device (1,2,..r), per treatment. Then along came the Bland-Altman plot which did a great job of subjectively showing changing bias in addition to DC bias. Two problems exist however. First, the (A-B) values plotted on the Y axis give an indication of variability; but no comparison is made. Are both devices equal at contributing 50% of the (A-B) dispersion or is one device contributing 99% and the other only 1%? Secondly, as the number of replicate runs increases (1,2, ..r), the Bland-Altman plot requires averaging of the points which reduces the variation under question which is okay if you only care about bias.

Now that the weaknesses of prevalent methods have been demonstrate to show a need; some solutions are provided:

Solutions for Repatability testing (for full agreement):

1.) C.J. Maloney & S.C. Rastogi (1970) demonstrate how to quantitatively perform a t test (or F test) on (A-B) vs (A+B) and this can easily be extended to the Bland-Altman (A-B) vs (A+B)/2 case.

2.) The replicate variances can be pooled across all patients for device A (PooledVarA) and device B (PooledVarB) and the resultant ratio can be tested against the F distribution.

It should be noted that each treatment variance follows a known Chi- squared distribution with r-1 degrees of freedom (dof) and the pooled quantities follow a Chi-squared distribution with n*(r-1) degrees of freedom; so the ratio follows an F distribution with n*(r-1) and n*(r-1) dof.

By pooling as described above, the treatment effect has been taken out of the analysis leaving variances that can be thought of as coming from independent samples under some assumptions. It is then possible to create even a more powerful F test by subtracting each treatment/device cell mean from all values to allow an F test with dof equal to n*p-1 and n*p-1 respectively.

This method can be modified if the device variation grows as a funtion of nominal value and it assumes that the treatment (patient) variance is minimial compared to the device repeatability variance.

3.) For more advanced cases, much has been published as shown below:

E.J.G. Pitman (1939)

Frank E. Grubbs (1948, 1973, 1982)

John L. Jaech (1971, 1973, 1979, 1981, 1985)

George W. Snedecor & William G Cochran (1989)

J.H.Hahn & W.Nelson (1970)

K. Krippendorff(1970)

L.G. Blackwood and E.L. Bradley (1991)

Frank Krummenauer and Gerhard Doll (2000)

ISO 5725

ASTM E691-87

James R Smithg (1990)

G. Dunn (1992)

R. Christenson & L.G.Blackwood (1993)

Testing agreement of two methods over a range 3 July 2004
Previous eLetter  Top
Bruce Siskowski,
Chief Engineer
Reichert Inc.

Send letter to journal:
Re: Testing agreement of two methods over a range

bsiskowski{at}reichert.com Bruce Siskowski

Often clinical testing is required to show agreement of two methods over a range of treatments or patients. This type of test involves dependent or correlated samples such that the standard t test of two independent means does not apply. Also, the F test of two independent variances does not apply. Consequently, many bio-statisticians or chemical clinicians performing analysis plot the results of one device (A) against the other (B) and determine the Pearson R squared value (R^2) and slope and sometimes the standard deviation of differences (sdiff).

Bland and Altman correctly suggest plotting the differences (A-B) vs the averages (A+B)/2 is better and this method has become quite widespread. There are some problems however. The Bland-Altman method, although better than calculating R^2, slope and sdiff, has several weaknesses. The Bland-Altman method can subjectively show DC bias and changing bias for any number of repeat runs (k) for each treatment/method combination; but, it only yields a weak view of within-treatment variation for the case of k=1 run per combination. When the differences (A-B) are plotted on the Y axis, it is not known how much of the variation is due to A and how much is due to B. Of course, no easy estimate can be made with only one replicate per condition (k=1); but when more than one replicate exists (k>1), there are many methods available that are better than the Bland-Altman plot. Also, there are quantitative methods that can be used to replace Bland-Altman plots when k>1 such as Maloney-Rastogi (1970).

When replicates exist for k>1, an F test of pooled variances can be done or more detailed methods can be used as shown by Pitman (1939), Grubbs (1948), Hahn&Nelson (1970), Jaech (1985), OSO 5725, Dunn (1992), etc.

It is important to realize that repeatability comparisons must be made in addition to bias comparisons to truly measure agreement or interchangeability of two methods or devices.

Note: This discussion assumes that a perfect low-error master does not exist as a golden standard. Otherwise, A and B can individually be tested against the master using very well known calibration techniques.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH
Copyright © 2009 by the American Association for Clinical Chemistry.