|
|
||||||||
Cancer Diagnostics |
Genomic Health, Inc., Redwood City, CA.
aAddress correspondence to this author at: Genomic Health, Inc., 301 Penobscot Dr., Redwood City, CA 94063. Fax 650-556-1132; e-mail mcronin{at}genomichealth.com.
| Abstract |
|---|
|
|
|---|
Methods: Assays used a pooled RNA sample from fixed paraffin-embedded tissues to evaluate the analytical performance of a 21-gene panel with respect to amplification efficiency, precision, linearity, and dynamic range, as well as limits of detection and quantification. Performance variables were estimated from assays carried out with sample dilutions. In addition, individual patient samples were used to test the optimized assay for reproducibility and sources of imprecision.
Results: Assay results defined acceptable operational performance ranges, including an estimated maximum deviation from linearity of <1 cycle threshold (CT) units over a
2000-fold range of RNA concentrations, with a mean quantification bias of 0.3% and CVs of 3.2%5.7%. An analysis of study design showed that assay imprecision contributed by instrument, operator, reagent, and day-to-day baseline variation was low, with SDs of <0.5 CT.
Conclusion: The analytical and operational performance specifications defined for the Oncotype DX assay allow the reporting of quantitative RS values for individual patients with an SD within 2 RS units on a 100-unit scale.
| Introduction |
|---|
|
|
|---|
Molecular-biomarker tests must be rigorously validated both analytically and clinically. Tests that simply dichotomize patient populations into broad categories do not translate easily into clinically useful tools. Clinically relevant tests provide precise, quantitative results that can be used to gauge the probabilities of success with available alternative courses of disease treatment (4). We have adapted methods commonly used to validate single-analyte laboratory tests to validate a complex multianalyte genomic diagnostic test, the Oncotype DX Breast Cancer Assay.
| Materials and Methods |
|---|
|
|
|---|
|
samples
We defined the histopathologic characteristics of samples acceptable for Oncotype DX assay analysis during 3 preliminary clinical-association studies. Guidelines for selecting fixed paraffin-embedded tissue (FPET) samples for the Oncotype DX assay are consistent with those for selecting optimal tissue blocks for standard immunohistochemistry assays. Essentially, the block containing the greatest amount of invasive breast carcinoma that is morphologically consistent with the submitting diagnosis and having the least amount of noninvasive mammary epithelium is selected. Samples with metabolically active nontumor elements constituting >50% of the tissue have those elements dissected out before sample extraction.
The goal of analytical validation was to provide an assay process with a fully optimized and documented standard operating procedure. Our studies focused on characterizing individual genes to define assay amplification efficiencies, linearity, dynamic range, reproducibility, and limits of detection and quantification. To measure the realistic limits of assay performance uncomplicated by biological variability contributed by individual samples, we created a single pooled test sample that represented a range of high and low expression values across the 21-gene panel (details in the online Data Supplement).
The final set of analytical validation studies tested the performance limits of the RS with individual patient samples to evaluate comprehensive assay performance in a way that realistically reflected patient sample testing.
fpet extraction
RNA was extracted from 3 sections (10 µm thick) of each FPET block. Paraffin was removed by xylene extraction followed by ethanol washing. RNA was isolated from deparaffinized tissue with the MasterPureTM Purification Kit (Epicentre Biotechnologies) with DNase I treatment, as previously described (13).
rna quantification
Extracted RNA was quantitated with the RiboGreen® fluorescence method (Molecular Probes/Invitrogen) as described previously (13).
residual genomic dna
Extracts were evaluated for residual genomic DNA with a TaqMan® quantitative PCR assay for ß-actin DNA (13). Samples exceeding the DNA threshold were retreated with DNase I. Nine of the Oncotype DX gene primer sets (for BAG1, CCNB1, SCUBE2, GRB7, ERBB2, MKI67, MYBL2, PGR, and TFRC genes) span intron sequences and so do not detect genomic DNA; the 12 remaining Oncotype DX genes were tested for sensitivity to genomic DNA added into an FPET RNA pool. A threshold was set at 3 SDs below the lowest value of genomic DNA detected for this assay to function as a reverse transcription (RT) negative control.
rt
RT of the purified FPET RNA was carried out as previously described with the OmniscriptTM RT Kit (Qiagen) and combined random hexamer and gene-specific priming (13).
pcr
Quantitative RT-PCR analysis was done in 384-well plates in a 10-µL volume with cDNA equivalent to 2 ng RNA. The only exception was in the linearity study, in which sample input varied over RNA equivalents between 210 and 23 ng/reaction. Plate cycling was carried out with ABI PRISM® 7900HT instruments (Applied Biosystems) according to the manufacturers instructions, as described previously (details in the online Data Supplement) (13).
instrumentation and reagent calibration
We calibrated liquid-handling robots to an independent calibration system. The performance of the ABI PRISM 7900HT instruments was assessed by calculating a within-plate CV with a standardized ribonuclease P assay (Applied Biosystems).
data reference normalization and rs calculation
To make RT-PCR measurements comparable for clinical interpretation, the Oncotype DX assay normalizes gene expression measurements to the mean expression of 5 reference genes (ACTB, GAPDH, GUSB, RPLP0, and TFRC). We tested combinations of these genes for normalization performance before selecting the mean of all 5 genes for our standard reference normalization method. For each sample, normalized expression measurements are calculated as the mean cycle threshold (CT) for the 5 reference genes minus the mean CT of triplicate measurements for each individual gene. Normalized expression measurements are scaled from 0 to 15 units, where 1 unit reflects an
2-fold change in RNA quantity.
After normalization, a single quantitative RS is calculated with expression values for 16 cancer-related genes, as has been described previously (14). The RS, scaled from 0 through 100, expresses the likelihood of distant breast cancer recurrence and is specific to the tumor of each individual patient.
measurement of assay amplification efficiencies
Amplification efficiency for each gene was approximated from RNA serial-dilution experiments recognizing the relationship: amplification efficiency
(21/Slope 1) x 100%, where the slope is estimated from the simple linear regression of CT measurements vs log2 RNA concentration (15)(16). Each of 15 sample dilutions underwent RT-PCR analysis twice with triplicate assays. Mean CT scores from both runs were averaged to assign a final expression value for each RNA concentration measurement used in the linear regression analysis.
measurement of assay linearity
For each gene in the assay panel, the range of linear assay response was assessed by comparing the proportionality of CT values relative to the input RNA concentrations calculated for each of the 15 pooled test sample dilutions used to measure amplification efficiency (see the online Data Supplement for sample dilution details). For this analysis, the polynomial method originally proposed by Krouwer et al. (17) and recommended Clinical and Laboratory Standards Institute guidelines (18) were applied. Specifically, orthogonal polynomial regression was used to obtain coefficients and associated tests of significance for the 1st (linear), 2nd (quadratic), and 3rd (cubic) order polynomials. Given that the log of the variance in CT measurement is proportional to the mean CT, error variance was modeled by use of a log-linear variance model. Model fits were obtained by use of the PROC Mixed procedure in SAS version 8.02.
In accordance with CLSI guidelines, any degree of nonlinearity in signal response was assessed by examining the SE of the regression and selecting the higher-order (nonlinear) polynomial model with the best fit. Specifically, at each input RNA concentration, the deviation from linearity (DL) was calculated as follows:
![]() |
measurement of quantitative bias and precision
Assessments of analytical bias are typically obtained by determining how much observed gene expression measurements differ from expected expression values derived from standard reference RNAs and definitive methods of analysis. Universal standard reference RNAs are not available for the 21 genes, nor are there universally accepted definitive analysis methods. Consequently, the quantitative bias and imprecision of predicted RNA concentrations relative to the calculated input RNA amounts in the series of pooled sample dilutions was estimated for each of the 21 genes. Specifically, for every CT measurement, an inverse prediction of RNA concentration was derived from the best fitting (nonlinear) polynomial calibration model derived during the linearity analysis (19). An estimate of the quantitative bias of the assay is given by the mean percent bias in prediction at each RNA concentration k (k = 1 ... 15); namely, by:
![]() |
ijk is the predicted RNA amount obtained from inverse prediction for the ith plate (i = 1, 2), jth well (j = 1, 2, 3), and kth RNA concentration value.
As a measure of the quantitative precision of the assay, for each RNA concentration level, ANOVA was used to separate the total variability in the difference between individual predicted and calculated RNA concentrations into components of variance due to plate and well within plate, treating plates and wells within plates as random. For this purpose, a random effects model of the following form was applied to the individual differences between predicted and calculated RNA concentrations: Ydiff ij = µ +
j +
i(j), where Ydiff ij is the difference between the predicted and actual RNA concentrations for the ith (i = 1, 2, 3) observation on the jth plate (j = 1, 2), µ is the overall mean difference,
j
iid N(0,
2
) is the effect of the jth plate (i = 1, 2), and
i(j)
iid N(0,
2
) is random error. We further assume that
s, ßs, and
s distribute independently and that the distribution of error terms vary as a function of calculated RNA concentration level (e.g., variance increases near the limit of quantification of the assay). This information was used to obtain an estimate of within-plate CV in predicted RNA concentration at the kth RNA concentration level, namely:
![]() |
2
is the estimate of error variance derived from the ANOVA. Limit of detection and limit of quantification values were calculated for each gene as well (details in the online Data Supplement).
measurement of assay reproducibility
Reproducibility in RS and in the measurement of the expression of individual genes ensures that results remain comparable for patients over time and for different submitting pathology laboratories. Individual gene and RS reproducibility were measured by performing repeat analyses across multiple days, operators, RT-PCR plates, 7900HT instruments, and liquid-handling robots. Two operators obtained replicate CT measurements on 2 aliquots of a single RNA sample over the course of 5 days with 3 7900HT instruments and 2 liquid-handling robots. The study design has a G-efficiency of >50% (20), allowing estimation of all main fixed effects, including 7900HT instrument, liquid-handling robot, and operator. All plates within days were processed in randomized order.
Mixed-effect ANOVA was used to divide calculated total variability in observed CT measurement and RS into components of variance due to days, plates within days, and within plates by treating operator, 7900HT instrument, and liquid-handling robot as fixed effects. For each of the 21 Oncotype DX genes and RS, restricted maximum likelihood estimates of the components of assay variance were obtained.
| Results |
|---|
|
|
|---|
75% to 112%, whereas mean efficiency was
88% for the reference genes (Table 2
|
assay linearity and dynamic range
For each gene, assay linearity was assessed by comparing the proportionality of CT measurements with respect to calculated RNA concentrations over the series of sample dilutions used to measure amplification efficiency. The quadratic and cubic polynomial terms were nonsignificant (P >0.05) for 6 genes (ACTB, BAG1, CD68, ESR1, RPLP0, and TFRC), indicating linear performance over the entire RNA range tested (210 to 8 ng). Another 6 genes (CCNB1, CTSL2, GSTM1, GUSB, MKI67, and PGR) were linear over more restricted RNA concentration ranges. For the remaining genes, the deviation from the linear model was estimated to be within the prespecified acceptance criterion of 1 CT unit. Based on the prespecified CLSI criterion, each of the 21 genes in the breast cancerrecurrence gene panel had an estimated maximum deviation from linearity of 1 CT over at least an 11-log2 (>2000-fold) concentration range (Table 3
).
|
At the highest RNA concentration tested in the series, 8 ng RNA/reaction, CT values for the 21 genes varied over an
8-fold range (256-fold concentration range). Consequently, the lowest-expressing genes (GUSB and CTSL2) were studied over a more limited concentration range than high-expressing genes (Table 3
). Low-expressing genes in this pooled sample would be expected to demonstrate linear performances similar to highly expressed genes at higher RNA concentrations. A linear assay response was seen for all genes over the expression range typically experienced in clinical samples.
assay quantitative bias and precision
The quantitative nature of the Oncotype DX RS depends directly on analytical accuracy and precision of measurement for each component of the 21-gene assay. Analytical accuracy could not be assessed in the absence of standard reference materials for each of the 21 analytes, so quantitative bias was measured by comparing predicted RNA concentrations for each gene to expected RNA concentrations calculated for a range of sample dilutions. At every CT measurement, an inverse prediction of RNA concentration was derived from the best-fitting polynomial calibration model derived from the linearity study (19). Assay bias is expressed as mean percentage deviation from the calculated value in prediction at each RNA concentration. At the 2-ng/well value used for the Oncotype DX assay, the estimated deviation from the expected value in the predicted RNA concentration ranged from 10% to 6%, with an estimated mean deviation from the expected value of 0.3% for the 16 cancer-related genes. For the reference genes, the mean percentage deviation from the expected value was 0.7%, indicating >99% mean quantitative correctness at this assay condition (Table 4
).
|
Each gene was analyzed for analytical imprecision of measurement. ANOVA was used at each RNA concentration to estimate the total variance in predicted RNA concentration derived from the inverse calibration model vs the actual RNA concentration. This information was used to estimate the CV in the predicted RNA concentration at each known RNA concentration. For the standard reaction, the imprecision of measurement for the 16 cancer-related genes on the RNA concentration scale had a CV of 5.7%, whereas the reference genes had a CV of 3.2%. All values were well within the prespecified acceptance limit of 20% (Table 4
).
day-to-day reproducibility
SDs in CT measurements varied from
0.06 to 0.15 CT units for each of the 21 genes, and the upper bounds on 2-sided 95% CIs for the CVs were all within 10%, indicating a high degree of precision and reproducibility in the assay (Table 5
). These SD and CV values are for the estimates of total variance. The between-day SD values were close to 0 for all 21 genes. A maximum SD of 0.15 at a CT of 30 translates to a CV of 0.5%. At this level of precision it is possible to reliably distinguish a 15% change in expression for specific genes.
|
Additionally, pairwise differences in (least-squares) mean CT values between operators, liquid-handling robots, and 7900HT instruments were calculated. The largest differences between operators, as well as between liquid-handling robots and 7900HT instruments, were <0.5 CT units for each of the 21 Oncotype DX genes (data not shown).
monitoring and controlling assay performance in the clinical reference laboratory
Clinical validation of the Oncotype DX Breast Cancer Assay was conducted after validating assay analytical performance with quality-control measures established from the results of analytical validation studies. The entire assay process, including process controls with associated performance-acceptance limits, was documented as a series of standard operating procedures that provide the basis for the current reference laboratory operation.
A standard RNA control sample is assayed at least once per batch of patient samples (
46 samples), and PCR controls are run in every assay plate to verify that the process and reagents continue to perform within specified ranges. RT-PCR failures, identified by analyzing the amplification curve from every assay well, are excluded from analysis. Expression values are assigned when at least 2 of 3 assay wells provide acceptable RT-PCR results. All 21 genes must have an expression value assigned for an RS to be calculated and reported.
Process monitoring in our clinical reference laboratory shows that RS reproducibility remains very high. Repeat testing with deidentified patient samples shows a cumulative SD of <2 RS units on a 100-unit scale, which represents all sources of process variation (Table 6
).
|
| Discussion |
|---|
|
|
|---|
This genomic diagnostic test is highly complex compared with more traditional clinical chemistry tests. Because it requires 21 quantitative measurements rather than the measurement of a single analyte, the assay does not conform to standard assay-validation formats. Nevertheless, successful analytical validation was critical to our goal of reporting quantitative results for individual patients. We defined a validation process analogous to validation methods for single analyte clinical assays to characterize assay amplification efficiency, linearity, quantification limitations, dynamic range, analytical precision, and reproducibility performance of the RT-PCR process that underlies the Oncotype DX assay for individual genes and for the assay as a whole as reflected by the RS (24).
Diagnostic gene expression assays fall into 2 principal categories (25). Both types test multiple analytes but diverge from one another in the quantitative analytical resolution they are able to achieve. One type of diagnostic test yields gene expression profiles for thousands or tens of thousands of candidate analytes by means of technologies such as hybridization microarrays, bead arrays, and protein mass spectrometry (26)(27)(28)(29)(30)(31)(32). Initially large analyte sets ultimately may be reduced to smaller sets that classify patients nearly as well as the original large test panel. Results obtained thus far indicate the challenge in validating this type of test for clinical utility, because such complex sets of targetseach with a relatively low precision and correctness of measurementlend themselves to overfitting in clinical-development and -validation studies (33)(34). Reproducibly measuring large panels is difficult, and such measurements typically produce high CVs. This imprecision erodes the quality of patient classifications, thereby limiting the value of these assays in clinical decision-making (35)(36)(37).
The Oncotype DX assay is a prototype for an alternative type of genomic diagnostic test. From hundreds of candidate genes quantitatively evaluated for association with the clinical outcomes of individual patients during a series of clinical-development studies, we selected an optimal set of RT-PCR assays for high overall performance in predicting the outcome of individual patients. In contrast to tests that dichotomize patients into general populations by matching classification profiles, the output from this test is individualized; a quantitative value places a patient precisely within a defined continuum of clinical outcomes on the basis of the results of clinical-validation studies.
Maintaining consistent, predictable RT-PCR assay performance for individual genes throughout clinical validation allowed assay-performance effects to be reliably differentiated from true patient variability. Consequently, each biomarker was repeatedly confirmed as important in predicting patient outcome while a foundation was concurrently established for continuous monitoring of assay quality in the clinical reference laboratory setting. Because key assay-performance metrics are quantitatively monitored, patients can be assured that their clinical reports are reliable and remain consistent with clinical-validation experience. This approach to analytical validation for a high-complexity genomic diagnostic test has been demonstrated to be useful by more than 3 years of successful operation in the clinical reference laboratory setting. Routine monitoring of the ongoing performance of the test shows that it continues to perform within the originally defined analytical conditions and to give reliable results of breast cancer risk to patients and their physicians.
| Acknowledgments |
|---|
Grant/funding support: None declared.
Financial disclosures: All authors are employees of Genomic Health, Inc.
| Footnotes |
|---|
2 Nonstandard abbreviations: RS, Recurrence Score; FPET, fixed paraffin-embedded tissue; RT, reverse transcription; CT, cycle threshold. ![]()
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
S. S. Badve, F. L. Baehner, R. P. Gray, B. H. Childs, T. Maddala, M.-L. Liu, S. C. Rowley, S. Shak, E. D. Perez, L. J. Shulman, et al. Estrogen- and Progesterone-Receptor Status in ECOG 2197: Comparison of Immunohistochemistry by Local and Central Laboratories and Quantitative Reverse Transcription Polymerase Chain Reaction by Central Laboratory J. Clin. Oncol., May 20, 2008; 26(15): 2473 - 2481. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Ross, C. Hatzis, W. F. Symmans, L. Pusztai, and G. N. Hortobagyi Commercialized Multigene Predictors of Clinical Outcome for Breast Cancer Oncologist, May 1, 2008; 13(5): 477 - 493. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Weber and C. Eng Update on the Molecular Diagnosis of Endocrine Tumors: Toward -omics-Based Personalized Healthcare? J. Clin. Endocrinol. Metab., April 1, 2008; 93(4): 1097 - 1104. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Marchionni, R. F. Wilson, A. C. Wolff, S. Marinopoulos, G. Parmigiani, E. B. Bass, and S. N. Goodman Systematic Review: Gene Expression Profiling Assays in Early-Stage Breast Cancer Ann Intern Med, March 4, 2008; 148(5): 358 - 369. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Sparano and S. Paik Development of the 21-Gene Assay and Its Application in Clinical Practice and Clinical Trials J. Clin. Oncol., February 10, 2008; 26(5): 721 - 728. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Lau, P. C. Boutros, M. Pintilie, F. H. Blackhall, C.-Q. Zhu, D. Strumpf, M. R. Johnston, G. Darling, S. Keshavjee, T. K. Waddell, et al. Three-Gene Prognostic Classifier for Early-Stage Non Small-Cell Lung Cancer J. Clin. Oncol., December 10, 2007; 25(35): 5562 - 5569. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |