|
|
||||||||
Oak Ridge Conference |
1 Johns Hopkins Hospital, Department of Urology, 600 North Wolfe St., Baltimore, MD 21287.
2 8032 Covered Bridge Drive, Quakertown, PA 18951.
3 UroCor, Inc., Division of Dianon Systems, Oklahoma City, OK 73104.
aAuthor for correspondence. Fax 410-614-3695; e-mail rveltri1{at}jhmi.edu.
Abstract
Background: Prostate cancer (PCa) pathologic staging remains a challenge for the physician using individual pretreatment variables. We have previously reported that UroScoreTM, a logistic regression (LR)-derived algorithm, can correctly predict organ-confined (OC) disease state with >90% accuracy. This study compares statistical and neural network (NN) approaches to predict PCa stage.
Methods: A subset (756 of 817) of radical prostatectomy patients was assessed: 434 with OC disease, 173 with capsular penetration (NOC-CP), and 149 with metastases (NOC-AD) in the training sample. Additionally, an OC + NOC-CP (n = 607) vs NOC-AD (n = 149) two-outcome model was prepared. Validation sets included 120 or 397 cases not used for modeling. Input variables included clinical and several quantitative biopsy pathology variables. The classification accuracies achieved with a NN with an error back-propagation architecture were compared with those of LR statistical modeling.
Results: We demonstrated >95% detection of OC PCa in three-outcome models, using both computational approaches. For training patient samples that were equally distributed for the three-outcome models, NNs gave a significantly higher overall classification accuracy than the LR approach (40% vs 96%, respectively). In the two-outcome models using either unequal or equal case distribution, the NNs had only a marginal advantage in classification accuracy over LR.
Conclusions: The strength of a mathematics-based disease-outcome model depends on the quality of the input variables, quantity of cases, case sample input distribution, and computational methods of data processing of inputs and outputs. We identified specific advantages for NNs, especially in the prediction of multiple-outcome models, related to the ability to pre- and postprocess inputs and outputs.
Prostate cancer (PCa)
1
is the most common malignancy among men in the US, with an estimated
200 000 new cases and 30 200 deaths in 2002 (1). Approximately 30% of men who are treated for localized disease will have recurrences, and a subset of these men will develop progressive disease (2)(3)(4)(5). The clinical staging of PCa continues to utilize serum prostate-specific antigen (PSA) and the digital rectal examination. Most patients diagnosed early with organ-confined (OC) tumors are curable
9095% of the time with radical prostatectomy (3)(5) and
8095% with radiation therapy (6). Today, a significant proportion (
2540%) of patients diagnosed with clinical stage T1c disease (PSA >4.0 µg/L and nonpalpable tumor) are found to have non-OC pathology (higher grade and/or stage) at radical prostatectomy (7)(8)(9)(10). Thus, there are still variable but significant numbers of patients with clinically localized disease who will have pathologically non-OC PCa at the time of treatment (10)(11)(12)(13). The goal of PCa clinical staging is to estimate the anatomic extent of the disease at the time of diagnosis (i.e., pretreatment), and several pretreatment prediction multiparameter models have been developed (4)(5)(13)(14)(15)(16)(17)(18)(19). Currently, one of the most widely used pretreatment staging prediction algorithms is based on the SPORE nomograms, also known as "Partin tables" (16)(17). These tables were developed with use of the clinical stage, total PSA (tPSA), and Gleason sum results from 4133 men with clinically localized PCa treated with radical prostatectomy between 1982 and 1996. The Partin tables provide the population-derived likelihood of having an OC cancer, isolated capsular penetration (CP), seminal vesicle involvement, and pelvic lymph node involvement for an individual falling into a particular tPSA range, Gleason sum, and clinical stage category (17). Statistical tools, such as logistic regression (LR), have also been applied to analyze data and create patient-specific stage prediction models for use at the pretreatment decision step as well as in posttreatment prognosis for PCa disease management (13)(14)(15)(16)(17)(18)(19)(20)(21)(22).
More recently, the application of neural networks (NNs) for predicting outcomes for PCa has attracted a great deal of attention, and computational solutions have been produced using various software configurations (23)(24)(25)(26)(27). A NN is a software machine that uses electronic components designed as parallel distributed processors with a propensity for storing experiential knowledge and making it available for use. NNs resemble the brain in at least two respects: (a) knowledge is acquired by the network through a learning process, and interneuron connection strengths (known as synaptic weights) are used to store knowledge (28)(29)(30); and (b) the NN is adaptive, fault tolerant, capable of very large-scale integration of information using neurobiological simulation principles, and produces a highly structured uniformity of analysis and architecture when finalized. The computing cells in a NN use an interconnection of simple computing cells, referred to as "neurons" or "processing units". The "learning algorithm" uses nonlinear mathematical transfer functions ranging from gaussian, sinusoid, and sigmoid to hybrids of these and sometimes combines these functions with preprocessing statistical or pre- and postprocessing genetic algorithms to create optimally performing NNs (28)(29)(30). The objective of such mathematical manipulations is to modify the synaptic weights of a networks processing units in an orderly fashion to attain the desired outcome prediction based on the availability of sufficient quantitative inputs (training data sets). There are numerous types of NN designs capable of processing complex data to make outcome predictions, including genetic, free form, error back-propagation, radial-based function, probabilistic, generalized regression, and self-organizing feature maps (23)(24)(25)(26)(27)(28)(29)(30). Depending on the type of problem, e.g., monitoring complex machine functions, medical outcomes, stock market forecasting, credit assignments, or pattern recognition, the NN design can be engineered to optimize the outcome prediction. The present study compares LR and a single type of NN computational architecture to predict PCa stage, using a well-defined cohort of PCa patients to assess the robustness of each model under different training conditions.
Materials and Methods
We obtained specimens from 2400 prostate sextant biopsy cases diagnosed between 1991 and 1997 from both academic and private-practice urologists. Reliable pathologic staging and/or clinical information was obtained for 988 patients: 191 from five academic collaborators and 797 from numerous community-based private-practice urologists. Any patient with a radical prostatectomy pathology report indicating neoadjuvant therapy or noting specific histopathologic evidence of such was excluded from the study. The final patient UroScoreTM original sample included 817 cases after removal of cases with missing pretreatment tPSA values (23). None of the patients from whom the 817 patient samples had been collected had neoadjuvant therapy before the biopsy. Using the 1997 TNM staging guidelines (31), we excluded 61 cases [47 OC, 12 non-OC with CP (NOC-CP), and 2 non-OC with metastasis (NOC-AD)], yielding 756 usable cases for the current analysis. The final distribution of cases was 434 OC, 173 NOC-CP, and 149 NOC-AD patients in the training sample. The removal of these 61 cases from the patient sample did not significantly alter the prediction of patient outcomes when the revised (n = 756) models were compared with the original (n = 817) models (data not shown).
The second grouping used for this study included OC + NOC-CP (n = 607) vs with NOC-AD (n = 149) patients to assess a two-outcome model. The validation patient sample set was based on the radical prostatectomy reports from an additional 120 PCa biopsy cases obtained from two sites: 98 from Dr. Rube Hundley, (Urology Associates of Dothan, P.A., Dothan, AL) and 22 cases from Johns Hopkins Hospital. These cases were audited and classified in the same manner. This validation group case distribution included 60 OC cases, 30 NOC-CP cases, and 30 NOC-AD cases. For the two-outcome model, the validation case distribution was 90 OC + OC-CP cases and 30 NOC-AD cases. In addition, when we constructed equal case distribution models, all remaining unused cases not in the training set were included in the validation runs.
All of the 756 and 98 prostate biopsy specimens (Dothan Urology) were processed and evaluated at a large national pathology reference laboratory (Oklahoma City, OK), whereas the 22 Johns Hopkins Hospital biopsies were processed at this site. The sextant biopsy pathology variables measured included the following: number of positive cores, highest Gleason sum, presence of Gleason grade 4 and/or 5, total percentage of tumor involvement, average percentage of tumor involvement per core (formulated by dividing the total percentage of tumor involvement by the total number of cores), average percentage of tumor involvement per positive core (formulated by dividing the total percentage of tumor involvement by the number of positive cores), and the tumor location (i.e.,
5% tumor involvement in base, mid, and/or apex core). The tPSA assay used the Food and Drug Administration-approved equimolar TOSOH (UroCor Labs) or Hybritech (Johns Hopkins Hospital) methods, and comparison of both tests yielded a correlation coefficient >98%. For the purposes of our analyses, the tPSA results were categorized by increments of 2 µg/L, based on a previously described method (23). Table 1
provides the patient demographics for the training (n = 756) and validation (n = 120) sets. As shown in Table 1
, the age distribution was similar with no statistically significant differences among the outcome groups for the training and validation sets. Table 1
also demonstrates that with increasing disease severity (OC
NON-CP
NOC-AD), similar trends were observed for the input variables of the training and validation sets.
|
We used the Stata v7.0 statistical software program for all LR methods, as detailed previously (23). Briefly, all of the biopsy pathology and clinical variables were examined multivariately by use of a stringency set at P = 0.20 for all independent variables and selection by backward stepwise LR to predict a three-outcome dependent variable of OC vs NOC-CP vs NOC-AD or the binary two-outcome dependent variable of OC + NOC-CP vs NOC-AD. The formulae and functions used to calculate patient-specific outcome predictive probabilities based on the results of the OLOGIT analysis have been published (23).
The iUnderstand v1.4 (BioComp Systems, Inc.) software program was used to construct an error back-propagation NN that included pre- and postprocessing statistical and genetic algorithms to optimize NN performance. We constructed an optimized three- and two-outcome model using unequal or equal case input distribution and the fixed variables summarized in Table 2
. The NN used either LR statistical preselected or nonpreselected input variables, the parameters in Table 2
, and 200 cycles or iterations. When LR preselection was applied, we used backward stepwise LR at a stringency of P = 0.20. The NN computed relationships between input variables and outcomes, identifying the 10 "fittest" solutions, and ultimately "evolved" a single optimized network. The optimization processing used a randomized training-testing set ratio for the 756 patients of 60:30:10 for training, testing, and robustness determination. Validation used only the PCa cases not used for training the NN and LR regression models.
|
Results
Shown in Table 3A
is a comparison of the mathematical performance of the recapitulated UroScore model applying the same 35% cutoffs previously used (only with the n = 756 data set) as described in the Materials and Methods above. The high percentages of correct classifications of OC by OLOGIT and NN and the overall accuracy were very similar to those described in the original report (23). Preselection by LR of the NN inputs had only a minor impact on NOC-CP identification. The 120-case validation set yielded an OC prediction of 91.7% for OLOGIT and 100% for NN regardless of statistical preselection (Table 3B
). OLOGIT identified 20% of the NOC-CP validation cases, whereas the NN identified only 3.3% of these cases. Overall accuracy for validation was indistinguishable for the LR and NN models.
|
When the OLOGIT and NN three-outcome computational models were built using an equal number of cases in each pathologic stage category, the impact on OC correct classification was significant (Table 4A
). The NN modeling approach for the three-outcome models achieved a
96% correct classification of OC cancer compared with 40% for OLOGIT. However, the LR model approach enabled much improved classification of the NOC-CP group (67.3% vs 14.7%; Table 4A
). On the other hand, on validation, the NN model was much more robust, with overall accuracies of 75.878.2% vs only 35.9% for LR.
|
We next tested a two-outcome computational model based on the fact that patients with OC and NOC-CP clinically localized disease will all be definitively treated with either radical prostatectomy or radiotherapy. Table 5
summarizes the unequal case input models and validation results and illustrates the marked improvement in classification of OC and NOC-AD as well as the overall percentage correctly classified. The NN marginally outperformed the LR model in the classification of NOC-AD in both the training (
89% vs
87%) and validation testing (
83.3% vs
79.2%). Overall accuracy for these two-outcome models showed a marked improvement over the three-outcome models, and the NNs tended to show improved performance over LR.
|
When we used the equal case distribution for training, the OLOGIT and NN performed very similarly (Table 6A
). These results also held up in the validation testing with a highly skewed patient cohort. Thus, in a two-outcome equal case distribution model, the statistical and NN computational models validated with very similar overall and individual percentages of patient categories correctly classified. On validation, there was a slight advantage of
5% for the NN over LR (Table 6B
). Overall accuracy for correct identification of the validation cases was similar both two outcome models (Tables 5B
and 6B
).
|
Discussion
Among contemporary PCa patients, there is limited clinical outcome predictive value when only the individual variables such as clinical stage, serum PSA, Gleason score, or grade of the biopsy are used to counsel the patient at pretreatment. The generation of additional quantitative pathology variables and the unreliability of this prior approach led to the development of several multimodal staging tools (13)(14)(15)(16)(17)(18)(19)(20)(21). These multiparameter computational tools incorporate several clinical and biopsy pathology variables, including DNA ploidy, to predict specific endpoints, such as pathologic stage or disease-free survival (7)(8)(13)(14)(15). The validity of some of these tools may become compromised when the training sets come from a demographically restricted cohort and the validation cases come from multiple demographically unrestricted sources (e.g., academic centers of excellence, community-based practices, or combinations thereof). Under such circumstances, it is not uncommon that the performance of some decision support tools may deteriorate when exposed to new cases (27)(28). Therefore, tools that do not withstand rigorous validation from the same or multiple institutions, including community-based practices, may not be as useful to aid in clinical decision-making (28).
As a result of increased PSA screening and public awareness, there is a predominance of clinically localized disease, including a predominance of T1c clinical stage cancers, in the US population (5)(7)(8). Additionally, the pathologic grades and stages of these contemporary cancers have decreased with a concomitant drop in mortality from PCa (2)(3)(5)(7)(8)(9)(20)(21). Likewise, PSA concentrations and standard biopsy pathologies have consolidated and become more similar (Gleason 6 and 7) among more contemporary patients cohorts (2)(3)(5)(13)(20)(21). It is therefore becoming a more challenging task for the urologist to predict the pathologic stage of those cancers that clinically present so uniformly with respect to the major pretreatment input variables often used: age, tPSA and or its derivatives, Gleason score, and clinical stage. Additional information, derived from a more detailed evaluation of the prostate biopsy, is readily available and has been shown to provide critical information in the past, and today, it may even outperform the Gleason score on multivariate analysis (5)(13)(14)(15)(16)(17)(20)(21)(22)(27)(32)(33)(34). For example, Badalament et al. (14) and others (15)(22)(23) identified high Gleason score and grade and number of biopsy-positive cores with cancer as two very important variables for pathologic stage prediction. Other investigators noted that the percentage of cores positive for cancer and surface-area-positive for cancer were the best predictors for pathologic stage and cancer volume in multivariate logistic regression analyses that also included PSA, age, clinical stage, and Gleason score (13)(14)(15)(18)(19)(20)(21)(22)(32)(33)(34). Aside from the emerging evidence for more detailed quantitative biopsy assessment, other variables, such as DNA ploidy (14)(15), quantitative nuclear morphometric analysis (8)(14)(15)(25)(32)(33)(34), or other serum and tissue markers, may provide information to overcome the challenge of optimal PCa staging (15)(33).
Previously (23), we demonstrated high diagnostic accuracy for OC disease when a more detailed evaluation of prostate biopsies, termed quantitative prostate biopsy pathology, was used to develop and challenge a statistical (LR) or NN model to predict PCa stage based on preoperative variables. These models underwent additional validation with a subset of 116 new patients and were able to correctly classify patients with OC cancers at the >90% level. The focus of the present report was to compare LR, a demonstrated statistical modeling method that we have applied successfully in the past (23), to a updated version of the NN software program (BioComp Systems, Inc.). Although the present report uses a subset of the original 817 patients (n = 756), its purpose was to make a comparison of the impact on patient outcome predictions when different patient sample training set distributions (equal and unequal) were used for the original three-outcome model. In addition, we also developed new two-outcome statistical and NN models that were more clinically relevant, using the same patient distribution approaches we assessed for the three-outcome model. By doing so, we have confirmed the performance of the UroScore statistical and NN three-outcome models and determined that only when an equal number of cases in each pathologic stage category is used for training does the NN dramatically outperform the LR model (Tables 3B
vs 4B
). In addition, we developed and validated new two-outcome clinically appropriate computational models and compared the statistical and NN computational modeling approaches with equal and unequal patient training sets. Under these modeling conditions, we demonstrated an overall correct classification advantage of
5% for NNs over LR to classify patients as either OC with or without CP (clinically localized curable disease) or advanced (positive seminal vesicles, lymph nodes, or bone disease).
In summary, when attempting to develop PCa staging pretreatment models, multiple quantitative biopsy pathology variables are strongly valued by either method of computation, NN or LR. However, the case distribution, whether unequal or equal, can severely impact training and validation of the LR three-outcome model (Table 3
vs Table 4
). The two-outcome LR and NN models performed with near parity irregardless of training case input distribution.
We can make several observations regarding the above computational differential performance of the three- vs two-outcome models: (a) It may be assumed that a large proportion of NOC-CP cancers are more biologically reminiscent of OC cancers as evidenced by the degree of overlap for the distribution of the input variables in each group, and therefore, the current input variables applied in the three-outcome model may not be as reliable to accurately classify these subtly different cancers. The biological outcome of surgically removed cancers that demonstrate CP on pathologic examination supports the assumption of a comparable behavior of at least a proportion of pT3 cancers. In a study of 721 men at Johns Hopkins, 58% of men with evident CP had no evidence of biochemical recurrence 10 years after surgery (35). (b) In a two-outcome model, the skewness of case distribution for model development is not as critical as in the three-outcome model, at least when LR is compared with a modified error back-propagation NN. (c) NNs tend to be more robust on validation, especially when the case distribution for training is equal. (d) This comparison of computational methods to produce patient-specific outcomes demonstrates some of the strengths and weaknesses of the two approaches.
Clearly for the future, these mathematical multiparameter approaches to predict disease outcomes will continue to be of value when they are properly architected, tested, and validated. There is little doubt that larger case numbers should be used for training, and we need to validate new pathologic and molecular biomarkers to further improve the performance of such computational algorithms (15)(33). Ultimately, our greatest challenge will be to provide quality and up-to-date disease management tools delivered to the urologist in forms such as the Partin tables (17), the Kattan nomogram (18), and UroScore (23).
Footnotes
1 Nonstandard abbreviations: PCa, prostate cancer; PSA, prostate-specific antigen; OC, organ-confined; tPSA, total PSA; CP, capsular penetration; LR, logistic regression; NN, neural network; NOC-CP, non-organ-confined with capsular penetration; and NOC-AD, non-organ-confined with metastasis. ![]()
References
The following articles in journals at HighWire Press have cited this article:
![]() |
D. Saxena, P. W. Caufield, Y. Li, S. Brown, J. Song, and R. Norman Genetic Classification of Severe Early Childhood Caries by Use of Subtracted DNA Fragments from Streptococcus mutans J. Clin. Microbiol., September 1, 2008; 46(9): 2868 - 2873. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. J. Lee, K. G. Kim, S. E. Lee, S.-S. Byun, S. I. Hwang, S. I. Jung, S. K. Hong, and S. H. Kim Role of transrectal ultrasonography in the prediction of prostate cancer: artificial neural network analysis. J. Ultrasound Med., July 1, 2006; 25(7): 815 - 821. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |