Clinical Chemistry AACC Online Job Center
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Clinical Chemistry 51: 1525-1528, 2005. First published June 10, 2005; 10.1373/clinchem.2005.050708
This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Data Supplements
Right arrow All Versions of this Article:
clinchem.2005.050708v1
51/8/1525    most recent
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (29)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Karsan, A.
Right arrow Articles by Veenstra, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Karsan, A.
Right arrow Articles by Veenstra, T.
Related Collections
Right arrow Cancer Diagnostics (since 2002)
Right arrow Proteomics and Protein Markers
(Clinical Chemistry. 2005;51:1525-1528.)
© 2005 American Association for Clinical Chemistry, Inc.


Technical Briefs

Analytical and Preanalytical Biases in Serum Proteomic Pattern Analysis for Breast Cancer Diagnosis

Aly Karsan1,a, Bernhard J. Eigl2, Stephane Flibotte3, Karen Gelmon2, Philip Switzer4, Patricia Hassell5, Dorothy Harrison5, Jennifer Law1, Malcolm Hayes1, Moira Stillwell4, Zhen Xiao6, Thomas P. Conrads6 and Timothy Veenstra6

Departments of1 Pathology and Laboratory Medicine and Medical Biophysics,2 Medical Oncology, and5 Radiology, and 3 Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada;
4 Department of Radiology, University of British Columbia, Vancouver, BC, Canada;
6 Laboratory of Proteomics and Analytical Technologies, SAIC, Inc., Frederick, MD;

aaddress correspondence to this author at: Department of Medical Biophysics, British Columbia Cancer Research Centre, 675 West 10th Ave., Vancouver, BC, Canada V5Z 1L3; fax 604-675-8049, e-mail akarsan{at}bccrc.ca

Currently available serum tumor markers lack sufficient specificity and sensitivity as stand-alone diagnostic or screening tests (1). Nevertheless, these assays are used extensively because of a lack of better alternatives. To accelerate the discovery of tumor markers for diagnosis and/or prognosis, there has been great enthusiasm in attempting to use mass spectrometry (MS)-based testing of serum to identify potential biomarkers or spectral patterns that can act as a fingerprints for specific diseases (1). Analysis of serum by surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) MS was recently reported to be able to predict the existence of ovarian cancer without missing a single case (2). In this method, capture of serum proteins on a biochip by use of surface chemistry is followed by MS analysis of all captured proteins, and data mining software is used to identify a pattern of spectral peaks that will predict the presence or absence of the particular disease state in question (2). Although the seminal study using this technique has been criticized by statisticians, bioinformaticians, and clinical chemists because of the likelihood of systematic bias, there has been no empirical testing of the potential for systematic bias in SELDI-TOF serum analysis (1)(3)(4)(5)(6). This study was conducted to determine whether spectral patterns generated by SELDI-TOF MS could distinguish between patients with cancer and those with benign disease among women presenting with suspicious breast abnormalities on mammography or physical examination.

The study was approved by the Research Ethics Board at the University of British Columbia, and all participating patients provided informed consent. We prospectively recruited 136 consecutive consenting patients attending 3 different clinics from September 2002 to April 2004 for a core needle biopsy for histopathologic diagnosis of a suspicious breast lump. Of the 136 patients, 3 were lost to follow-up, and 1 sample was not frozen within 6 h. Of the remaining 132 patients, 63 came from clinic A, 64 from clinic B, and 5 from clinic C. All patients with a positive core biopsy for malignancy as well as a subset with a negative core biopsy had excisional biopsies. A total of 96 patients (72.7%) received a histopathologic diagnosis of breast cancer (ductal carcinoma in situ, n = 13; invasive ductal carcinoma, n = 78; lobular, tubulolobular, or mixed, n = 5).

Serum samples were collected before biopsy in 7-mL glass serum tubes with no additive (BD Vacutainer®), aliquoted, and frozen at –80 °C within 6 h of phlebotomy until used for SELDI-TOF analysis. A pilot study using chips with different surface chemistries led us to choose the immobilized metal affinity capture (IMAC3) chips with Cu(II) as the metal ion to use for this study because of good reproducibility of duplicate spectra and the generation of multiple features on the spectra for analysis (data not shown). IMAC3 chips from the same lot were used for all samples run to avoid lot-to-lot variability. Chips were charged with 100 mmol/L CuSO4, fixed with 100 mmol/L sodium acetate (pH 4.0), and equilibrated with phosphate-buffered saline (PBS), pH 7.4. Serum samples (50 µL) were diluted 1:1 in 8 mol/L urea containing 10 mL/L CHAPS and after vortex-mixing were further diluted 1:5 in PBS. Diluted serum samples (100 µL) were then applied to IMAC3 chips in duplicate, on spots on different chips. After washes in PBS and water, chips were air-dried, and 1 µL of saturated sinnapinic acid in 500 mL/L acetonitrile–5 mL/L trifluoroacetic acid was applied to each spot. The 132 serum samples were prepared and spotted in duplicate on 3 consecutive days, and the spotted arrays were read on a PBS II ProteinChip reader (Ciphergen Systems) on 2 consecutive days (one-third on the first day and the remaining two-thirds on the following day).

Spectra were calibrated externally and analyzed by mapping the raw (nonfiltered, non–baseline-subtracted) TOF spectra to mass spectra consisting of 16 384 channels, with mass calibration given by m/z = aC2, where C is the channel number and a = 0.0001 m/z, and normalized to the same total area. An automated procedure to find and fit the peaks in the mass spectra has been developed by one of the authors and is freely available (sflibotte{at}bcgsc.ca). This procedure generates an average spectrum from all samples, which is divided into several sections by a heuristic approach to obtain the best possible fit. Each section is then fitted iteratively with the appropriate number of gaussian peaks superimposed on a locally quadratic background. The duplicate pairs of spectra from each specimen in the dataset were averaged and fitted, with each section from the average spectrum used as a template. Duplicate spectra from individual samples showed a high degree of reproducibility as demonstrated by a median Pearson correlation coefficient of 0.9704 for all pairs of spectra evaluated. The Pearson correlation coefficients were calculated in the mass region used in the fitting procedure (m/z 533 to 26 840). Examples of duplicate raw spectra are shown in Fig. 1 of the Data Supplement that accompanies the online version of this Technical Brief at http://www.clinchem.org/content/vol51/issue8/. The position, width, and height of the peaks, and the local background were all fitted at the same time; this procedure corrects for local gain, matching variations between spectra because the absolute position of each peak is free to vary, although the relative position of each peak is fixed. A total of 445 peaks were fitted for each spectrum.



View larger version (15K):
[in this window]
[in a new window]
 
Figure 1. Ability of the C4.5 algorithm to identify the source of serum samples based on the area under the peaks at m/z 5643 and 2992.

{circ}, clinic A; {blacksquare}, clinic B.

Two machine-learning algorithms, a support vector machine (SVM) and C4.5, were used in various analyses using all 445 peaks as described below. SVMs perform well in situations in which the number of samples in the dataset is not large compared with the number of attributes, i.e., peak areas in this case, and have been used successfully in microarray and SELDI-TOF experiments (7)(8). The C4.5 algorithm, a decision tree algorithm, is also a widely used machine learning algorithm applied in many settings (8). A 10-fold cross-validation was performed 10 times to assess each classification scheme. In other words, the dataset was divided into 10 equal groups, 9 of which were used to build a classifier and predict the classes of the samples in the remaining group. All 10 groups were assessed in this way. This 10-fold cross-validation procedure was repeated 9 more times with a breakdown of the dataset into 10 different but random groups. A majority predictor, which simply predicts the majority class in the dataset, was used as a comparator. Classification accuracy with means, SDs, and probability values to assess significant differences from the majority predictor (using a Student t-test) were generated by the Weka machine learning software (8).

Our findings demonstrate that specimen collection and processing introduce significant biases in the spectral pattern, such that machine learning algorithms can differentiate between sample source, day that the chips were set up, and days that they were read. In contrast, accuracy of predicting cancer was much poorer.

As demonstrated in Table 1 , neither machine learning algorithm was able to classify patients with breast cancer any better than the majority predictor. Two previous studies using IMAC3 chips identified 2 different sets of peaks that were able to classify patients with breast cancer (9)(10). Attempts to classify the spectra by use of these published peaks were also unsuccessful. We then eliminated every sample in which the duplicate spectra could not be overlaid by visual inspection. Both algorithms performed slightly better than the majority predictor in classifying cancer in this reduced subset of 70 patients, but the results were not statistically significant. To reduce possible source-related biases, we next analyzed specimens that showed reproducible spectra but came from only one clinic. However, the use of samples from only one clinic did not improve classification accuracy by either machine learning algorithm (Table 1 ).


View this table:
[in this window]
[in a new window]
 
Table 1. Accuracy of 2 machine learning algorithms in diagnosis and prediction of variables associated with the proteomic analysis of serum from patients with breast cancer.

In contrast to the lack of predictive ability of the spectral patterns for the diagnosis of breast cancer, both machine learning algorithms demonstrated an excellent ability to predict on which day the chips were read and on which day they were prepared, albeit the second variable may be a function of the first (Table 1Up ). Even more surprisingly, there were distinct spectral features that the algorithms successfully applied to classifying the clinics from which the samples were acquired (Table 1Up ). Fig. 1Up shows a dot plot demonstrating that, using only 2 peaks at m/z 2992 and 5643, the C4.5 algorithm was able to distinguish between samples obtained in clinic A or B.

The very high probability values assigned to the classifications of distinct analytical and preanalytical variables suggest that in previous reports there may have been inadvertent biases in sample collection, storage, or processing between patients from different groups being tested. There are several potential reasons for the analytical and preanalytical biases seen, but the specific mechanisms remain to be elucidated. The findings presented here empirically validate the concerns of clinical chemists and bioinformaticians that serum profiling of unfractionated serum may be detecting preanalytical and analytical variables that are not reflective of the disease state (3)(4)(11). The inability to use previously published peaks to classify breast cancer patients in this study also suggests that there are likely site-specific findings that reflect analytical and preanalytical biases for SELDI-TOF MS. As has been pointed out, it can be difficult to obtain stable, reproducible SELDI-TOF MS results over time and across laboratories (12). Although a recent study has shown some reproducibility across laboratories, only 28 optimal spectra out of a cohort of more than 1000 were used for the validation, and whether the reproducible peaks actually represent cancer biomarkers or artifacts was not addressed (13)(14). Recent critiques have argued that the two main potential problems with observational studies arise from chance and bias (15)(16)(17). The current study highlights the effects of bias; thus, future studies attempting to profile serum by proteomic approaches will have to take extreme care in specimen handling and storage, as well as in randomization of specimen preparation and spectrum collection times, to discover true disease-related spectral profiles.


Acknowledgments

We thank Ingrid Pollet and Fred Wong for assistance with serum sample freezing. This study was funded by grants to A.K. from the National Cancer Institute of Canada with funds from the Canadian Cancer Society and to A.K. and K.G. from the Canadian Breast Cancer Foundation (BC Chapter), and in part with US funds from the National Cancer Institute, National Institutes of Health, under Contract NO1-CO-12400 to T.V., T.P.C., and Z.X. A.K. is supported by a personnel award from the Heart and Stroke Foundation of Canada and a Scholarship from the Michael Smith Foundation for Health Research.


References

  1. Diamandis EP. Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations. Mol Cell Proteomics 2004;3:367-378.[Abstract/Free Full Text]
  2. Petricoin EF, Ardekani AM, Hitt BA, Levine PJ, Fusaro VA, Steinberg SM, et al. Use of proteomic patterns in serum to identify ovarian cancer. Lancet 2002;359:572-577.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  3. Sorace JM, Zhan M. A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 2003;4:24.[CrossRef][Medline] [Order article via Infotrieve]
  4. Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 2004;20:777-785.[Abstract/Free Full Text]
  5. Diamandis EP. Point: Proteomic patterns in biological fluids: do they represent the future of cancer diagnostics?. Clin Chem 2003;49:1272-1275.[Free Full Text]
  6. Diamandis EP. Analysis of serum proteomic patterns for early cancer diagnosis: drawing attention to potential problems. J Natl Cancer Inst 2004;96:353-356.[Free Full Text]
  7. Zhang Z, Bast RC, Jr, Yu Y, Li J, Sokoll LJ, Rai AJ, et al. Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer. Cancer Res 2004;64:5882-5890.[Abstract/Free Full Text]
  8. Witten IH, Frank E. Data mining: practical machine learning tools and techniques with JAVA implementations 2000:371pp Morgan Kaufman San Francisco. .
  9. Pusztai L, Gregory BW, Baggerly KA, Peng B, Koomen J, Kuerer HM, et al. Pharmacoproteomic analysis of prechemotherapy and postchemotherapy plasma samples from patients receiving neoadjuvant or adjuvant chemotherapy for breast carcinoma. Cancer 2004;100:1814-1822.[CrossRef][Medline] [Order article via Infotrieve]
  10. Li J, Zhang Z, Rosenzweig J, Wang YY, Chan DW. Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clin Chem 2002;48:1296-1304.[Abstract/Free Full Text]
  11. Diamandis EP, van der Merwe DE. Plasma protein profiling by mass spectrometry for cancer diagnosis: opportunities and limitations. Clin Cancer Res 2005;11:963-965.[Free Full Text]
  12. Coombes KR, Morris JS, Hu J, Edmonson SR, Baggerly KA. Serum proteomics profiling-a young technology begins to mature. Nat Biotechnol 2005;23:291-292.[CrossRef][ISI][Medline] [Order article via Infotrieve]
  13. Semmes OJ, Feng Z, Adam BL, Banez LL, Bigbee WL, Campos D, et al. Evaluation of serum protein profiling by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry for the detection of prostate cancer: I. Assessment of platform reproducibility. Clin Chem 2005;51:102-112.[Abstract/Free Full Text]
  14. Hortin GL. Can mass spectrometric protein profiling meet desired standards of clinical laboratory practice?. Clin Chem 2005;51:3-5.[Free Full Text]
  15. Ransohoff DF. Lessons from controversy: ovarian cancer screening and serum proteomics. J Natl Cancer Inst 2005;97:315-319.[Abstract/Free Full Text]
  16. Baggerly KA, Morris JS, Edmonson SR, Coombes KR. Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. J Natl Cancer Inst 2005;97:307-309.[Abstract/Free Full Text]
  17. Liotta LA, Lowenthal M, Mehta A, Conrads TP, Veenstra TD, Fishman DA, et al. Importance of communication between producers and consumers of publicly available experimental data. J Natl Cancer Inst 2005;97:310-314.[Abstract/Free Full Text]



The following articles in journals at HighWire Press have cited this article:


Home page
Clin. Chem.Home page
D. McLerran, W. E. Grizzle, Z. Feng, W. L. Bigbee, L. L. Banez, L. H. Cazares, D. W. Chan, J. Diaz, E. Izbicka, J. Kagan, et al.
Analytical Validation of Serum Proteomic Profiling for Diagnosis of Prostate Cancer: Sources of Sample Bias
Clin. Chem., January 1, 2008; 54(1): 44 - 52.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
N. Seam, D. A. Gonzales, S. J. Kern, G. L. Hortin, G. T. Hoehn, and A. F. Suffredini
Quality Control of Serum Albumin Depletion for Proteomic Analysis
Clin. Chem., November 1, 2007; 53(11): 1915 - 1920.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
P. Findeisen, S. Post, F. Wenz, and M. Neumaier
Addition of Exogenous Reporter Peptides to Serum Samples before Mass Spectrometry-Based Protease Profiling Provides Advantages over Profiling of Endogenous Peptides
Clin. Chem., October 1, 2007; 53(10): 1864 - 1866.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
E. P. Diamandis
Oncopeptidomics: A Useful Approach for Cancer Diagnosis?
Clin. Chem., June 1, 2007; 53(6): 1004 - 1006.
[Full Text] [PDF]


Home page
aacredbookHome page
E. P. Diamandis
Is Early Detection of Cancer with Serum Biomarkers or Proteomic Profiling Feasible?
Am. Assoc. Cancer Res. Educ. Book, April 14, 2007; 2007(1): 129 - 132.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
J. F. Timms, E. Arslan-Low, A. Gentry-Maharaj, Z. Luo, D. T'Jampens, V. N. Podust, J. Ford, E. T. Fung, A. Gammerman, I. Jacobs, et al.
Preanalytic Influence of Sample Handling on SELDI-TOF Serum Protein Profiles
Clin. Chem., April 1, 2007; 53(4): 645 - 656.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
G. L. Hortin
The MALDI-TOF Mass Spectrometric View of the Plasma Proteome and Peptidome
Clin. Chem., July 1, 2006; 52(7): 1223 - 1237.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
E. P. Diamandis
Serum Proteomic Profiling by Matrix-Assisted Laser Desorption-Ionization Time-of-Flight Mass Spectrometry for Cancer Diagnosis: Next Steps
Cancer Res., June 1, 2006; 66(11): 5540 - 5541.
[Full Text] [PDF]


Home page
Am. J. Respir. Crit. Care Med.Home page
C. Agostini and M. Facco
The promising future of proteomics in sarcoidosis.
Am. J. Respir. Crit. Care Med., May 15, 2006; 173(10): 1053 - 1054.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
E. P. Diamandis
Validation of breast cancer biomarkers identified by mass spectrometry.
Clin. Chem., April 1, 2006; 52(4): 771 - 772.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
P. Findeisen, D. Sismanidis, M. Riedl, V. Costina, and M. Neumaier
Preanalytical Impact of Sample Handling on Proteome Profiling Experiments with Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry
Clin. Chem., December 1, 2005; 51(12): 2409 - 2411.
[Full Text] [PDF]


Home page
Clin. Chem.Home page
S. R. Master
Diagnostic Proteomics: Back to Basics?
Clin. Chem., August 1, 2005; 51(8): 1333 - 1334.
[Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Data Supplements
Right arrow All Versions of this Article:
clinchem.2005.050708v1
51/8/1525    most recent
Right arrow Submit an electronic Letter to
the Editor about this paper
Right arrow Alert me when this article is cited
Right arrow Alert me when eLetters are posted
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (29)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Karsan, A.
Right arrow Articles by Veenstra, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Karsan, A.
Right arrow Articles by Veenstra, T.
Related Collections
Right arrow Cancer Diagnostics (since 2002)
Right arrow Proteomics and Protein Markers


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS