Clinical Chemistry AACC Online Job Center
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Clinical Chemistry 49: 1959-1962, 2003; 10.1373/clinchem.2003.020891
This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Henderson, A. R.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Henderson, A. R.
(Clinical Chemistry. 2003;49:1959-1962.)
© 2003 American Association for Clinical Chemistry, Inc.


Book, Software, and Web Site Reviews

A Triptych of Statistics

A. Ralph Henderson

Emeritus Professor of Biochemistry University of Western Ontario London, Ontario N6A 5C1, Canada

Statistical Computing, An Introduction to Data Analysis Using S-Plus. Michael J. Crawley. Chichester, UK: Wiley UK, 2002 (reprinted, with corrections, March 2003), 772 pp., $85.00, hardcover. ISBN 0-471-56040-5.

The Statistical Evaluation of Medical Tests for Classification and Prediction. Margaret S. Pepe. New York: Oxford University Press, 2003, 320 pp., $115.00, hardcover. ISBN 0-19-850984-7.

Introductory Biostatistics. Chap T. Le. Hoboken, NJ: Wiley-Interscience, A John Wiley & Sons Publication, 2003, 572 pp., $94.95, hardcover. ISBN 0-471-41816-1.

You always need to know ten times as much as you use

—Quoted by E.A. Murphy in Biostatistics in Medicine (1982)

Teaching data analysis is not easy, and the time, allowed is always far from sufficient

—J.W. Tukey (1962)

As junior medical students in Glasgow in the 1950s, we learned statistics from a poor lecturer and an excellent book (M.J. Moroney, Facts from Figures, Penguin, 1951). Unfortunately this was before the availability of calculators, and therefore, no effort was made to use these techniques to examine the results we were currently producing in the practical classes of physiology and biochemistry. We therefore benefited little from that early exposure except to remember the t-distribution because of its association with beer.

I thought of Moroney’s book when I opened Crawley’s Statistical Computing. Both volumes are written in lively style; are lucid, comprehensive, and rigorous; and are illuminated throughout with flashes of dry humor. Crawley is a distinguished ecologist and a member of the Department of Biological Sciences at Imperial College in London, England, and has been involved in the teaching and research applications of statistical techniques for many years. One difference between Facts from Figures and Statistical Computing is now the ready availability of the computer, which allows a reader of the latter text to examine data and assess their statistical significance while using the book.

Crawley writes "This book is intended as both an introduction to and a reference manual for statistics and computing. It assumes nothing by way of background in either subject, and starts from absolute basics. All it takes for granted is an enthusiasm to learn. It covers everything from the simplest non-parametric techniques (e.g., the runs test) up to the most advanced modern methods (e.g., mixed effects modelling)". Later he insists "This is a statistics book for non-statisticians". He adds "The comments of successive generations of students on the annual statistical computing course have greatly improved the clarity of the presentation, and have helped me to understand which bits of statistical modelling and computing are particularly daunting for beginners".

The subtitle mentions that the data analyses used in the book require S-Plus, which is an interactive programming environment for data analysis and graphics. It originated from the S language (a system for programming with data) developed at AT&T’s (now Lucent Technologies) Bell Laboratories and is commercially available (www.insightful.com). An independent implementation of the S language is also available as open source (and free of charge) software called R (www. r-project.org). If you do not have access to S-Plus, you can use R, although there are some differences between these languages and Crawley addresses these in his web page (www.bio.ic.ac.uk/research/mjcraw/statcomp/).

While reading Crawley’s text I wondered what statistical programs contributors to Clinical Chemistry most often used. A text word search of the journal’s archives (from September 1965 onward) revealed that various SPSS statistical programs (www.spss.com) were used in 166 papers, whereas SAS (www.sas.com) was used in 133 contributions and Stata (www.stata.com) in 16 articles; fewer than 10 articles used S-Plus.

Crawley suggests that learning S-Plus will change the way one does statistics, but it will not be easy. Of relevance is a quote by Dr. Terry Therneau (Mayo Clinic) "Something that will take me 2–4 h in S-Plus, will take me 2–4 days in SAS" (1). My experience of Crawley’s technique is that his clarity, his profound statistical insights, his facility with S-Plus, and his stepwise approach to even the most complex statistical procedure is illuminating in a fashion that other S-Plus books often fail to achieve, and it demonstrates how valuable student comments are, as noted earlier, when directed to a committed teacher.

The book has 36 chapters, nearly all of which are accompanied by a list for further reading, and the book concludes with an extensive bibliography and an extremely comprehensive, very useful, 27-page index containing all of the S-Plus commands used in the text. Essentially there are four parts to the book: the elements of statistics (central tendency, probability, variance, normal distribution, classical tests, regression, and ANOVA); an extension of these elements (ANCOVA and the main classes of generalized linear models); more advanced topics [further examination of the generalized linear models (GLM) and more advanced aspects of ANOVA]; and finally, a host of advanced techniques, such as bootstrap and jackknife, tree models, nonparametric smoothing, survival analysis, time series analysis, mixed-effects models, and spatial statistics. Several chapters amount to a minicourse on sound statistical techniques: statistical methods; experimental design; power calculations; statistical models in S-Plus; understanding data (graphical and tabular analysis); model criticism and simplification; and graphs, functions, and transformations. As concepts develop, page references are provided for more advanced treatment later in the text that in turn are usefully back-referenced, making it convenient for review.

Crawley, as noted previously, has a web page that includes all downloadable data files and programs (script files) used in his book, making it easy to perform the examples without having to type (S-Plus has a very useful function, the up-arrow key, that reproduces all previous commands, thus saving much time). He also intends to have corrections, exercises, and additional chapters available on his book page. This is a book with a real future.

What topics were not covered? S-Plus provides several functions for performing quality control. Disappointingly, Crawley did not address these routines. Moroney, incidentally, examined problems of quality control at length. Another omission, in my view, was ROC, which is an important part of practical statistical activity. Crawley explains Bayes’ theorem in terms of conditional probability but omits the easier likelihood ratio approach, which is the form most used in medical diagnostics. I have previously mentioned the comprehensive, and excellent, subject index, so I was disappointed to see that literature references were not indexed to the chapters in which they were cited. This type of indexing is extremely valuable when following up on a citation (or, alternatively, the provision of an author index). An example of this useful practice can be viewed in Barnett & Lewis’s Outliers in Statistical Data, 3rd edition. Although Crawley deals with outliers, he did not cite this seminal text. I experienced errors and omissions in some of the scripts that accordingly did not produce the anticipated results; these, however, forced effective problem solving. Finally, any user of S-Plus soon runs into cryptic error messages when one’s enthusiasm outruns one’s expertise. It would have been valuable to have Crawley’s advice on how to cope with errors.

I believe that this book is an outstanding and masterful contribution to statistical thought. It will be of immense value for trainees and established workers alike in the field of clinical chemistry.

Pepe’s book is an addition to the literature on ROC analysis until now represented in volumes by J.P. Egan (Signal Detection Theory and ROC Analysis, 1975), J.A. Swets and R.M. Pickett (Evaluation of Diagnostic Systems: Methods from Signal Detection Theory, 1982), and H.C. Kraemer (Evaluating Medical Tests: Objective and Quantitative Guidelines, 1992). Her book contains nine chapters, end of chapter exercises (but no solutions), a bibliography, and a subject index, but it sadly lacks an author index. She is donating book royalties to the charity Doctors Without Borders. Pepe is a Professor of Biostatistics at the University of Washington and the Fred Hutchinson Cancer Research Center. The data sets, and the Stata programs used in the book, can be accessed on-line (www.fhcrc.org/labs/pepe/book). Pepe sets out "to provide a systematic framework for the statistical theory and practice of research studies that seek to evaluate clinical tests used in the practice of medicine". She hopes that it will be found useful for "practising biostatisticians and more academic research biostatisticians". Clinical chemists? Laboratory physicians?

The opening chapter outlines the criteria for a useful diagnostic/screening test and the elements of study design. The seminal paper by Ransohoff and Feinstein (2), although the basis for much of this section, was not cited. The chapter closes with a description of the seven valuable data sets used to illustrate the book’s methodologies. The following chapter deals with measures of accuracy for binary tests, i.e., diseased/nondiseased and tested positive/tested negative. The three measures of diagnostic accuracy (disease-specific classification probabilities—true and false positive fractions, predictive values—positive and negative predictive values, and likelihood ratios) are illustrated with one of the available data sets containing a cohort study of 1465 individuals. The relative merits of each measure are discussed and illustrated in an extremely useful tabulation. Likelihood ratios have received increasing attention because they quantify the increase of knowledge about the presence (or absence) of disease through the diagnostic testing process. Pepe warns, however, of a growing realization that for a variety of reasons, discussed more fully later, there is no basis for the assumption regarding the constancy of test sensitivity and specificity. What Kraemer (cited earlier) calls "the myth".

Chapters three and six address regression modeling framework. Pepe identifies a range of factors that may, potentially, affect test performance, such the age and gender of the tested individual, the conditions under which tests are administered and run, and of course, the disease manifestations and the nondisease state [Ref. (2) again, but not cited]. Regression analysis may be used to assess the importance of such factors as well as to compare different paired (preferred) or unpaired test results. The advantage of regression modeling is that the analysis can control for concomitant factors. To appreciate the power of such techniques it is necessary to become familiar with the concepts of GLM and generalized estimating equations (GEE). Unfortunately, these techniques are not described in the standard introductory medical statistics texts, and more advanced texts (B. Everitt, Modern Medical Statistics: A Practical Guide, 2003) have to be consulted (Crawley’s Statistical Computing, reviewed above, also provides considerable advice on modeling techniques). These approaches can be applied to all three of the diagnostic accuracy measures described earlier.

The next two chapters deal with the ROC curve and will be familiar to most laboratory workers. This technique—described by Pepe as "the best-developed statistical tool for describing the performance of [continuous or ordinal scaled] tests"—has been in use for many decades. Pepe insists that "ROC curves have nothing to do with the particular distributions of the tests results but rather quantify the relationships between distributions". The most common summary index for ROC curves is the area under the curve, but there are other frequently used indices: a specific ROC point, partial area under the curve, symmetry point (where sensitivity = specificity), and the Kolmogorov–Smirnov index (the maximum vertical distance between the ROC curve and the 45° line).

Pepe devotes a whole chapter to the problem of incomplete data and imperfect reference tests, a recurring practical and theoretical irritation during test assessments. She considers three scenarios: verification-biased sampling, verification restricted to screen positives, and imperfect reference tests. When screened-positive tested cases are verified for disease (but not screened-negative cases), this verification bias always produces an increase in sensitivity and a decrease in specificity compared with their true values. This bias may be corrected by application of Bayes’ theorem [the Begg and Greenes’ estimates (3)(4)]. Extreme verification bias results when the gold standard test is applied only to cases at high risk of disease. The approach to this problem is theoretically more complex and probably not entirely satisfactory. Imperfect reference tests are well known. Two examples are culturing the organism causing an infection and obtaining a cancerous biopsy specimen. In each case, improvement in molecular biological and imaging techniques are making these sources of error less common, but they do still occur. Two suggested approaches are use of Bayesian methodology and latent class analysis.

The penultimate chapter examines the five phases of the research required for the development of a test and the importance of suitable sample sizes. Pepe concludes the volume with consideration of metaanalysis [using the Moses algorithm—the summary ROC, sROC, curve (5)], incorporating the time dimension using the disease state as a time-dependent variable, and a discussion of combining information from multiple tests.

Although there were several mentions of the Zweig and Campbell review (6), no reference was made to Shultz’s important multiROC approach (7). One annoying feature of this text was the extremely poor indexing. For example, although terms such as sensitivity and specificity (or their synonyms) occur throughout the text, they are not readily reached by use of the index, and although the term "bootstrap" is mentioned many times in the text, it is not indexed at all. The term "bronze standard" (a less reliable gold standard) was incorrectly indexed.

In summary, Pepe’s book is a useful but demanding introduction to the present status of the field. Potential users of this volume might like to consider Statistical Methods in Diagnostic Medicine (2002) by Zhou, Obuchowski, and McClish as an alternative entry to the current literature.

Le’s book, Introductory Biostatistics, incorporates much of the content of his earlier text (Health and Numbers: A Problem-Based Introduction to Biostatistics) with the aim of providing an introductory text for students of the human health disciplines. Le is Distinguished Professor of Biostatistics and Director of Biostatistics, Comprehensive Cancer Center, University of Minnesota. There are 12 chapters, each with a set of very extensive and useful exercises (many are provided with comprehensive answers at the end of the book). Le opens, unconventionally, with a chapter on methods of categorical data, introducing concepts of proportions and rates with many illustrative examples. The terms "test sensitivity" and "specificity" are defined, but the use of a diagram to illustrate their interrelationship would have been more useful as a teaching tool than the rather unilluminating text. The chapter also includes a fairly comprehensive outline of Microsoft’s Excel program and a brief mention of the SAS program (is this appropriate in an introductory text?). The last exercise in this chapter contains a very large data file with the annotation that an electronic copy is available from the author, although no author contact address is provided. The publisher has a web page for the book and lists a download site for the book’s data sets, but when I accessed that site (May 27, 2003), I obtained the message "page not found".

The second chapter returns to a conventional approach to data analysis: graphs and simple statistical descriptions. Le uses the standard, five-mark, tally system despite its known susceptibility to error. He could usefully have introduced his audience to the more reliable Tukey 10-item square tally (4 points, 4 sides, and 2 diagonals). The next chapter deals with probability models, including the use of diagnostic test results. Here the author needs student feedback. The explanations are less helpful than they could be (for example, the 2 x 2 table could be labeled true and false positives, and so on) instead of defining these terms in the text. Bayes’ theorem is described in terms of conditional probabilities, and the likelihood ratio approach is ignored, although he had previously introduced the concept of odds. The targeted audience is going to use this latter approach to diagnostic tests and should have been introduced to that usage. The description of the kappa statistic is marred by a nomenclature change during the explanation. The remainder of the text covers the standard elementary materials, but includes ROC curves, logistic regression, analysis of survival data (Le’s interest), and study design.

The subject index is adequate, but there are many typographic errors in both figure legends and labels. At the Arabian Gulf University in Bahrain, we taught biostatistics to our first-year medical students using Essentials of Medical Statistics by Kirkwood and Basic and Clinical Biostatistics by Dawson-Saunders and Trapp. Would I recommend a change to Le’s Introductory Biostatistics? Not until there is convincing evidence of active student feedback.



View larger version (86K):
[in this window]
[in a new window]
 
Figure 1.



View larger version (53K):
[in this window]
[in a new window]
 
Figure 2.



View larger version (78K):
[in this window]
[in a new window]
 
Figure 3.


References

  1. Everitt B, Rabe-Hesketh S. Analyzing medical data using S-PLUS 2001:v-vi Springer New York. .
  2. Ransohoff DF, Feinstein AR. Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 1978;299:926-930.[Abstract]
  3. Begg CB, Greenes RA. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 1983;39:207-215.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  4. Punglia RS, D’Amico AV, Catalona WJ, Roehl KA, Kuntz KM. Effect of verification bias on screening for prostatic cancer by measurement of prostate-specific antigen. N Engl J Med 2003;349:335-342.[Abstract/Free Full Text]
  5. Moses LE, Shapiro D, Littenberg B. Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations. Stat Med 1993;12:1293-1316.[ISI][Medline] [Order article via Infotrieve]
  6. Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine [Review]. Clin Chem 1993;39:561-577.[Abstract/Free Full Text]
  7. Shultz EK. Multivariate receiver-operating characteristic curve analysis: prostate cancer screening as an example. Clin Chem 1995;41:1248-1255.[Abstract/Free Full Text]




This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Henderson, A. R.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Henderson, A. R.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS