|
|
||||||||
Departments of
1
Urology and
2
Medical Biometry, University Hospital Charité, Humboldt University, D-10098 Berlin, Germany.
aAddress correspondence to this author at: Department of Urology, University Hospital Charité, Humboldt University Berlin, Schumannstrasse 20/21, D-10098 Berlin, Germany. Fax 49-30-450-515904; e-mail Klaus.jung{at}charite.de.
| Abstract |
|---|
|
|
|---|
Methods: Eight available programs running under Windows (AccuROC, Analyse-It, CMDT, GraphROC, MedCalc, mROC, ROCKIT, and SPSS) were evaluated. ROC analyses of prostate-specific antigen and related values were performed from a dataset of 928 men with prostate cancer and benign prostatic hyperplasia and corresponding subsets. Criteria such as data input, data output, and correctness and completeness of results were used to evaluate the practicability of the programs.
Results: Although the programs produced equivalent results (areas under the curves and their characteristics), we observed deficiencies concerning input of data, processing of the output data, and completeness of the results. Analyse-It, AccuROC, and MedCalc exhibited good performance, but each program had different shortcomings. Only GraphROC could compare curves at a certain sensitivity or specificity cutoff.
Conclusions: Adequate ROC analysis and ROC plotting cannot be performed with a single program. Analyse-It, AccuROC, and MedCalc can be recommended with certain limitations. Further improvements of the programs are necessary.
| Introduction |
|---|
|
|
|---|
300 studies in the 1980s to >5000 studies since 1990. Several computer programs have been developed to generate ROC curves, and some of the early programs were briefly described in 1993 (2). However, all of these early programs had limitations for easy and accessible practical use. Within the last several years, commercial and public domain programs have become available for complex ROC analysis and ROC plotting. To our knowledge, an overview and comparison of these newly available ROC programs has not been performed. The aims of this study were (a) to survey currently available ROC programs, (b) to compare these ROC programs for their ease of use, and (c) to evaluate their relative utility in ROC analysis.
| Material and Methods |
|---|
|
|
|---|
|
|
datasets for roc analysis
To compare the programs, we used a previously described dataset of 928 men with prostate cancer (n = 606) and benign prostatic hyperplasia (n = 322) and subgroups of this population (3). ROC analyses of total prostate-specific antigen (tPSA),1
free PSA (fPSA), the ratio of fPSA to tPSA (fPSA/tPSA), and of other values calculated by an artificial neural network approach with the mentioned dataset (3) were carried out to estimate the advantages and disadvantages of each program.
evaluation criteria
To evaluate the programs, five simple criteria were chosen to encompass the ease of learning program operations, use of the software, and data handling and to characterize the usefulness of a each program (Table 2
). A maximum percentage value was assigned to each criterion. The sum of all percentage values gives the final score. The criteria are described briefly below:
|
Data input.
It is important to import or copy data into the program easily without any intermediate storage or special format, to be able to edit the data in the program (e.g., in a spreadsheet), and to save more than one dataset. The tendency of each program to crash was also taken into consideration.
Data output.
Presentation of the results and processing of the exported data were assessed. The program should be able structure the results comprehensively. Processing of data characterizes the capability of the program to export and save the results, including the calculated graphs, as well as to draw more than one curve in one graph. This facility is very important for comparing several tests with each other.
Analysis results.
This criterion was the most important one and included correctness and completeness of the results. It is obvious that correctness of results is mandatory. Incorrect results had to be considered as an exclusion criterion to recommend the respective software for ROC analysis.
There are several approaches to calculate the area under the ROC curve (AUC) for the comparison of ROC curves. Table 3
lists the main characteristics and limitations of three commonly used methods. It is crucial to know whether the curves result from independent or dependent (correlated) data. In laboratory diagnostics, the values of interest are in most cases measured on the same patients. We therefore considered only methods for correlated data. A second distinction can be made between nonparametric and parametric methods. Parametric methods are efficient under certain assumptions. These assumptions are often not fulfilled in practice, and their results are biased. Nonparametric methods should be used if the variables follow an ordinal or skewed distribution or if there are small sample sizes. A parametric approach should be preferred in case of a large sample size and continuous measurements.
|
The subcriterion completeness assessed the capability of a program to calculate all necessary ROC data for a reasonable decision regarding a diagnostic test. This included the AUC with its confidence intervals (CIs), the sensitivities and specificities at certain cutoffs with their CIs, the presentation of the graph, and the ability to compare the AUCs showing the respective statistical significance values.
Program comfort.
This point of the comparison dealt with the compatibility of the program with standard calculation, text, and presentation programs, e.g., Microsoft Excel, Word, or PowerPoint. Programs were also evaluated based on the availability of help functions, tutorials, and demonstration versions and ease of obtaining information regarding program updates.
User manual.
This criterion assessed the structure and comprehensibility of the user manual and whether the manufacturer provides an online manual, a homepage, or an e-mail address to solve current problems.
| Results |
|---|
|
|
|---|
|
AccuROC
AccuROC uses the method of DeLong et al. (4). To our knowledge, at this stage it is the only program that uses this method. The layout of the program is very well structured, and because of the comprehensive manual and the up-to-date homepage, the program is easy to learn. Up to three curves can be drawn into one graph, and the coordinates of each curve can be saved, which makes it possible to put more than three curves in one graph with use of a calculating program such as Excel. Furthermore, AccuROC can calculate the CIs and SD with a bootstrap method.
A serious drawback of this program is that except for the graph and its coordinates, none of the other results can be saved or exported; they can only be printed. If a diagnostic marker shows that lower values are associated with a higher risk of disease, all the test values have to be transformed by rendering them negative, manually or using a spreadsheet. This procedure makes the data input quite complicated.
Analyse-It
This software was published in 2001. The ROC analysis is performed according to the method of Hanley and McNeil (5)(6). According to the information of the software developers, an update was planned for the end of 2002. This update should use the method of DeLong et al. (4). It is an add-in program for Microsoft Excel. Like the software MedCalc, it is a program that implements several statistical procedures, including ROC analysis. It is simple to use and provides a very good online manual, help function, and tutorial. An advantage of its integration into Excel is that the interplay with other programs is excellent. Data input is easy, and the layout is clearly arranged. All necessary results are calculated in one step, and up to three curves can be displayed in one graph.
Unfortunately, AUCs can not be compared if any AUC is <0.7, but this will also be changed in the update version. Another drawback of this program is that it does not calculate CIs for the sensitivities and specificities.
cmdt
CMDT is a freeware program and can be downloaded from the internet (Table 1
). An estimate of the AUC is given by the Wilcoxon rank-sum statistic. For comparison of ROC curves, it uses a permutation test suggested by Venkatraman and Begg (7).
The drawbacks of this program are that it is prone to crashing, the graph can barely be edited in the program, and only one curve can be displayed. This makes it impossible to compare curves visually. Furthermore, the graph is not of publication quality and has to be saved as an extended metafile to be processed in another graphics program.
The advantages of the program are that it uses a bootstrap method to calculate the CIs and that the data can be edited in the program.
GraphROC
The program GraphROC uses the method of Hanley and McNeil (5)(6) to calculate the ROC curve. It is one of the first commercially available programs on the Windows platform and is still in use (8). GraphROC is a longwinded program. Creating an input file is complicated, and it is not possible to edit the data after loading them into the program. Every result has to be copied via clipboard to save it. To edit the graph, it has to be copied via clipboard into another graphics program. In addition, the program is susceptible to crashing.
The advantages of GraphROC are the ability to draw several curves in one graph and the opportunity to compare paired and unpaired datasets. It is also possible to compare curves at a certain sensitivity or specificity cutoff, which is, as far as we know, a feature that only GraphROC provides. A demonstration version of GraphROC can be downloaded.
MedCalc
MedCalc also works with the method of Hanley and McNeil (5)(6). This program is very interesting for those users who wish to do more than just ROC analysis because it provides a wide range of other special biomedical statistics, e.g., Bland-Altman plots, Passing-Bablok regression, and logistic regression. The data import is very easy and is possible from Excel, SPSS, dbase, Lotus, and as a text file. The layout is clearly arranged, it is possible to export data, and the graph can be edited in the program. MedCalc provides an online manual, and a 30-day demonstration version can be downloaded from the company homepage.
A clear disadvantage of this program is that only two curves can be presented in one graph.
mROC
mROC is a computer program that implements an approach of combining the ROC curves of several tumor markers or test values by the best linear combination, which maximizes the AUC under the hypothesis of a multivariate gaussian distribution (9). Methods for estimating CIs for the AUC are also provided (10). Furthermore, conventional ROC analysis is possible. Learning to work with the program is easy, the layout is well structured, and the provided manual is intelligible. However, the data input is quite complicated, and the data cannot be edited in the program. Numerical and graphic results can be exported. Unfortunately, only one curve can be displayed in a graph, and a comparison of different ROC curves is not possible.
By combining several markers or tests into one ROC curve, thus creating a "virtual marker", this program brings interesting additional new aspects to ROC analysis. Nevertheless, it cannot be recommended for a convenient ROC analysis.
rockit
ROCKIT is a free program developed by C.E. Metz et al. (11)(12)(13). Although it is mathematically a very well thought-out program, we would not recommend this program unless the user has a statistical background. It is uncomfortable to create an input file, the layout is somewhat confusing, the interplay with other programs is not optimized, it does not have a help function, and it frequently crashed when we used it.
Apart from these disadvantages, it calculates all necessary results, and with the included software PLOTROC (a program in Excel), several curves can be displayed in one graph.
spss
Although SPSS is a widely used statistical program, the ROC analysis within this package is not yet fully developed. In SPSS it is not possible to compare ROC curves. More than one curve in a graph can be displayed only if either higher or lower values of a marker are associated with a higher risk of disease. Despite the advantage of this program to show a wide range of other statistics, a valid ROC analysis cannot be performed with this software.
As can be seen in Table 2
, we did not find any software that fulfilled all our expectations perfectly. Every program had advantages and disadvantages. More detailed characteristics of each program are summarized in Table 5
.
|
| Discussion |
|---|
|
|
|---|
The results of the comparison show that three of the eight programs can make ROC analysis easier and more economical. The leading program is Analyse-it with a final score of 91%. Although this program received maximum scores for the criteria data input, software comfort, and user manual, it is not acceptable that only three curves can be displayed and that the CIs for the sensitivities and specificities are not calculated. However, add-in software for a program, such as Excel, that is already widely used is potentially valuable, and if the drawbacks can be removed in a future version, this software could make ROC analysis much easier. Except for SPSS, none of the other programs provides as good a help function and tutorial. Questions concerning the program are answered quickly via e-mail. Therefore, the price is acceptable considering such good service. Additionally, a full demonstration version can be downloaded atwww.analyse-it.com.
In second place is AccuROC with a total score of 85%. Its use of the totally nonparametric method of DeLong et al. (4) and bootstrap methods (17) and its well-structured layout are the strong points of this program. On the other hand, complicated data input and the fact that data output (except the graph) can only be printed and not be saved or copied are disadvantages. Another drawback is the limited license for 2 years and the limited use of this program for only one computer. If one attaches great importance to highly accurate results and accepts the mentioned drawbacks, we can recommend AccuROC.
The third software that we recommend is MedCalc, with a total score of 84%. Although the ROC analysis is only one tool of this program, all necessary parameters are calculated. Data and results are clearly arranged, and the general handling is easy. Unfortunately, only two curves can be presented in one graph, which limits the relevant use of this program. If it were not for this drawback, MedCalc would fulfill most of our expectations of efficient ROC analysis. Even the price is reasonable, considering the additional statistical methods included. For those who do not need a multicurve presentation and are interested in a wide range of other statistics, MedCalc is a reasonable software.
GraphROC achieved a score of 78%. The completeness of the results cannot be criticized. All the main parameters can be calculated with this software. It even has a feature that shows every possible cutoff point with its sensitivity and specificity in a separate diagram with automatic updating of clinical sensitivity and specificity values, by use of simple mouse clicks. The main drawbacks are the user-unfriendly data input and the longwinded processing of results and graphs. The user-friendliness of the program would be improved if there was a way to export the points of the ROC curve to either a text file or spreadsheet. This would give the user more flexibility in terms of graphic capability. GraphROC can still compete with the other programs for ROC analysis, although the software has not been further developed since 1996.
The shortcomings of the other four programs outlined above make it difficult to recommend these programs for regular ROC analysis.
In summary, it is surprising that valid ROC analysis with all necessary data and a good plotting function is not offered in a single program. It should not be necessary to use more than one software to perform a valid ROC analysis. Therefore, the programs Analyse-it, AccuROC, or MedCalc should be enhanced as described above to provide all necessary functions.
| Acknowledgments |
|---|
| Footnotes |
|---|
2 Both authors contributed equally to this article. ![]()
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
J-H Choi, D K Cho, Y-B Song, J-Y Hahn, S Choi, H-C Gwon, D-K Kim, S H Lee, J K Oh, and E-S Jeon Preoperative NT-proBNP and CRP predict perioperative major cardiovascular events in non-cardiac surgery Heart, January 1, 2010; 96(1): 56 - 62. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Elliott, R. Shinghal, and J. C. Presti Jr. The Influence of Prostate Volume on Prostate-Specific Antigen Performance: Implications for the Prostate Cancer Prevention Trial Outcomes Clin. Cancer Res., July 15, 2009; 15(14): 4694 - 4699. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Soreide Receiver-operating characteristic curve analysis in diagnostic, prognostic and predictive biomarker research J. Clin. Pathol., January 1, 2009; 62(1): 1 - 5. [Full Text] [PDF] |
||||
![]() |
B Kunadian, J Dunning, R Das, A P Roberts, R Morley, A J Turley, D Twomey, J A Hall, R A Wright, A G C Sutton, et al. External validation of established risk adjustment models for procedural complications after percutaneous coronary intervention Heart, August 1, 2008; 94(8): 1012 - 1018. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Sonego, A. Kocsor, and S. Pongor ROC analysis: applications to the classification of biological sequences and 3D structures Brief Bioinform, May 1, 2008; 9(3): 198 - 209. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.S. Waring, A.F.L. Stephen, O.D.G. Robinson, M.A. Dow, and J.M. Pettie Serum urea concentration and the risk of hepatotoxicity after paracetamol overdose QJM, May 1, 2008; 101(5): 359 - 363. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Florio, L. Bruni, C. De Falco, G. Filardi, M. Torricelli, F. M. Reis, L. Galleri, C. Voltolini, C. Bocchi, V. De Leo, et al. Evaluation of Endometrial Urocortin Secretion for Prediction of Pregnancy after Intrauterine Insemination Clin. Chem., February 1, 2008; 54(2): 350 - 355. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Florio, R. Abella, E. Marinoni, R. Di Iorio, C. Letizia, M. Meli, T. de la Torre, F. Petraglia, A. Cazzaniga, A. Giamberti, et al. Adrenomedullin Blood Concentrations in Infants Subjected to Cardiopulmonary Bypass: Correlation with Monitoring Parameters and Prediction of Poor Neurological Outcome Clin. Chem., January 1, 2008; 54(1): 202 - 206. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. G. Covell, R. Huang, and A. Wallqvist Anticancer medicines in development: assessment of bioactivity profiles within the National Cancer Institute anticancer screening data Mol. Cancer Ther., August 1, 2007; 6(8): 2261 - 2270. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Florio, F. M. Severi, C. Bocchi, S. Luisi, M. Mazzini, S. Danero, M. Torricelli, and F. Petraglia Single Serum Activin A Testing to Predict Ectopic Pregnancy J. Clin. Endocrinol. Metab., May 1, 2007; 92(5): 1748 - 1753. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Florio, R. Felipe Abella, T. de la Torre, A. Giamberti, S. Luisi, G. Butera, A. Cazzaniga, A. Frigiola, F. Petraglia, and D. Gazzolo Perioperative Activin A Concentrations as a Predictive Marker of Neurologic Abnormalities in Children after Open Heart Surgery Clin. Chem., May 1, 2007; 53(5): 982 - 985. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-G. Debray, G. A. Mitchell, P. Allard, B. H. Robinson, J. A. Hanley, and M. Lambert Diagnostic Accuracy of Blood Lactate-to-Pyruvate Molar Ratio in the Differential Diagnosis of Congenital Lactic Acidosis Clin. Chem., May 1, 2007; 53(5): 916 - 921. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Florio, S. Luisi, B. Moataza, M. Torricelli, I. Iman, M. Hala, A. Hanna, F. Petraglia, and D. Gazzolo High Urinary Concentrations of Activin A in Asphyxiated Full-Term Newborns with Moderate or Severe Hypoxic Ischemic Encephalopathy Clin. Chem., March 1, 2007; 53(3): 520 - 522. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Florio, E. Marinoni, R. Di Iorio, M. Bashir, S. Ciotti, R. Sacchi, M. Bruschettini, M. Lituania, G. Serra, F. Michetti, et al. Urinary S100B Protein Concentrations Are Increased in Intrauterine Growth-Retarded Newborns Pediatrics, September 1, 2006; 118(3): e747 - e754. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Florio, S. Perrone, S. Luisi, P. Vezzosi, M. Longini, B. Marzocchi, F. Petraglia, and G. Buonocore Increased Plasma Concentrations of Activin A Predict Intraventricular Hemorrhage in Preterm Newborns Clin. Chem., August 1, 2006; 52(8): 1516 - 1521. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. K. Kim, H. S. Kwak, C. S. Kim, G. H. Chung, Y. M. Han, and J. M. Lee Hepatocellular Carcinoma in Patients with Chronic Liver Disease: Comparison of SPIO-enhanced MR Imaging and 16-Detector Row CT Radiology, December 21, 2005; (2005) 2381042193. [Abstract] [Full Text] |
||||
![]() |
N. A. Obuchowski ROC Analysis Am. J. Roentgenol., February 1, 2005; 184(2): 364 - 372. [Full Text] [PDF] |
||||
![]() |
P. Florio, S. Luisi, M. Bruschettini, D. Grutzfeld, A. Dobrzanska, P. Bruschettini, F. Petraglia, and D. Gazzolo Cerebrospinal Fluid Activin A Measurement in Asphyxiated Full-Term Newborns Predicts Hypoxic Ischemic Encephalopathy Clin. Chem., December 1, 2004; 50(12): 2386 - 2389. [Full Text] [PDF] |
||||
![]() |
P. Florio, A. Imperatore, F. Sanseverino, M. Torricelli, F. M. Reis, P. J. Lowry, and F. Petraglia The Measurement of Maternal Plasma Corticotropin-Releasing Factor (CRF) and CRF-Binding Protein Improves the Early Prediction of Preeclampsia J. Clin. Endocrinol. Metab., September 1, 2004; 89(9): 4673 - 4677. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. A. Obuchowski, M. L. Lieber, and F. H. Wians Jr. ROC Curves in Clinical Chemistry: Uses, Misuses, and Possible Solutions Clin. Chem., July 1, 2004; 50(7): 1118 - 1125. [Abstract] [Full Text] [PDF] |
||||
Read all eLetters
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |