|
|
||||||||
Proteomics and Protein Markers |
1 Barnett Institute and Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA.
2 Dana-Farber Cancer Institute, Boston, MA.
3 Renal Unit, Department of Medicine, Massachusetts General Hospital, Harvard University, Cambridge, MA.
aAddress correspondence to this author at: Barnett Institute and Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115. Fax 617-373-2855; e-mail wi.hancock{at}neu.edu.
| Abstract |
|---|
|
|
|---|
Methods: We used multilectin affinity chromatography (M-LAC) to isolate glycoproteins from the sera of breast cancer patients and controls. The proteins were identified by HPLCtandem mass spectrometry (MS/MS) analysis of the corresponding tryptic digests. We used the FuncAssociate Gene Ontology program for association analysis of the identified proteins. Biomarker candidates in these groups were comparatively quantitated by use of peak area measurements, with inclusion of an internal standard. We analyzed data for concordance within the ontology association groups for vector of change with the development of breast cancer.
Results: Detection of the known low-concentration biomarker HER-2 (824 µg/L) enabled us to establish a dynamic range of 106, relative to the amount of albumin, for the depletion step. We then used ELISA to confirm this range. Proteins associated with lipid transport and metabolism, cell growth and maintenance, ion homeostasis, and protease inhibition were found to be differentially regulated in serum from women with breast cancer compared with serum from women without breast cancer.
Conclusions: M-LAC for isolation of the serum glycoproteome, coupled with liquid chromatographyMS/MS and the use of gene ontology associations, can be used to characterize large panels of candidate markers, which can then be evaluated in a particular patient population.
| Introduction |
|---|
|
|
|---|
40 000 deaths each year in the US, this disease is highly curable if diagnosed at an early stage (1), but screening mammography is likely to miss early cancers (2). Proteomic profiling may identify tumor markers in blood that could aid in early diagnosis and subsequent treatment (3)(4), but characterizing such biomarkers in early-stage breast cancer is challenging. During early-stage disease, only small amounts of material may be released into the bloodstream, and analysis of these low-abundance proteins is complicated by the large dynamic range of serum proteins and the complexity of posttranslational modifications, especially glycosylation. The removal of the most abundant protein(s) by use of an immunoaffinity approach can enhance the identification of low-abundance proteins in plasma and serum (5), but depletion of proteins such as albumin that form complexes with low-abundance proteins can lead to loss of potential biomarkers (6). We previously developed a multilectin affinity system for efficient and specific enrichment of human serum glycoproteins (7) and demonstrated that a multilectin affinity chromatography (M-LAC)1 column containing Jacalin, concanavalin A, and wheat germ agglutinin could overcome the low affinity and lack of complete glycoprotein capture that typically occurs with a single-lectin affinity selector. This approach was specific to glycoproteins, yielded good recovery, and was reproducible, markedly improving the dynamic range of serum proteomic analysis and minimizing nonspecific losses associated with the depletion of abundant carrier proteins.
Because glycoproteins make up at least 50% of the blood proteome (7), and blood biomarkers of breast cancer that are glycoproteins have already been identified, such as HER-2 (8) and carcinoembryonic antigen (9)(10), we used M-LAC to investigate the glycoproteome in serum of breast cancer patients.
| Materials and Methods |
|---|
|
|
|---|
samples
Study samples were collected from May 2000 to February 2002 from patients with ductal carcinoma in situ (DCIS) or invasive breast cancer (IC) [American Joint Committee on Cancer (11) stage IV]. We obtained control samples from healthy, cancer-free women who donated blood to the Brigham and Womens Hospital Blood Bank from January 2001 to June 2002. All participants gave informed consent in accordance with protocol 93-085 of the Partners Institutional Review Board. Samples were archived under protocols approved by the Human Subjects Committee of the Partners HealthCare System (Boston, MA) and the Institutional Review Board for the Dana-Farber Cancer Institute, Harvard Medical School. Samples used for this study had been stored for 13 years (controls) or 23 years (patients). Patient disease data are listed in Table 1
. Blood samples were collected in red-top tubes (Becton Dickinson) and processed in a CLIA-approved facility (Dana-Farber Cancer Clinical Laboratory). After separation from clot, serum was divided into aliquots, placed in cryovials, and stored in the Specialized Programs of Research Excellence Breast Cancer Research Bank. Samples were thawed once and divided into 100-µL aliquots. Each aliquot was used only once per analysis, so only 1 freeze-thaw cycle per aliquot was required. Serum samples were obtained from 5 patients with IC, 5 patients with DCIS, and 5 controls. We pooled the same amount of protein from each sample and processed the 3 pooled samples and 15 individual samples with the following procedures.
|
glycoprotein enrichment by m-lac column
We prepared the multilectin column by mixing equal amounts of agarose-bound concanavalin A, agarose-bound wheat germ agglutinin, and agarose-bound Jacalin in an empty PD-10 disposable column (GE Healthcare). The 100-µL serum sample was diluted with multilectin column equilibrium buffer (20 mmol/L Tris, 0.15 mol/L NaCl, 1 mmol/L Mn2+, and 1 mmol/L Ca2+, pH 7.4) to a volume of 1 mL and loaded on a newly packed multilectin affinity column. After a 15-min incubation, the unbound proteins were eluted with 10 mL of equilibrium buffer, and the captured proteins were released with 12 mL of a specific displacer solution (20 mmol/L Tris, 0.5 mol/L NaCl, 0.17 mol/L methyl-
-D-mannopyranoside, 0.17 mol/L N-acetylglucosamine and 0.27 mol/L galactose, pH 7.4). The captured fraction was collected from the multilectin affinity column and concentrated with 15-mL, 10 kD Amicon filters (Millipore).
liquid chromatographytandem mass spectrometry
We used a previously described procedure (7) for trypsin digestion of the glycoprotein fractions. The trypsin-digested peptides were then separated and analyzed on an Ettan MDLC system (GE Healthcare) coupled to an LTQ linear ion trap (ThermoElectron). Approximately 2 µg of each sample were injected onto a Peptide Captrap column (Michrom Bioresources, Inc.) with the autosampler of the MDLC system. To desalt the sample, the trap column was washed with H2O with 0.1% formic acid at a flow rate of 10 µL/min for 4 min. Flow was directed to the solvent waste through a 10-port valve. After the sample was desalted, the valve was switched to direct the flow to the separation column. The desalted peptides were then released and separated on a C18 capillary column (packed in-house; Magic C18; 150 x 0.075 mm). The flow rate was maintained at 400 nL/min and monitored with a flow meter. The gradient was started at 0% acetonitrile (ACN) with 0.1% formic acid and linearly increased to 35% ACN in 120 min, then to 60% ACN in 40 min, and to 90% ACN in another 20 min. The gradient was then maintained at 90% ACN for 20 min. The Ettan MDLC was operated with UNICORNTM control software (GE Healthcare). We analyzed the resolved peptides on an LTQ linear ion trap mass spectrometer or the corresponding hybrid system coupled to a Fourier Transform mass spectrometry (MS) with a nanoelectrospray ionization ion source. The Fourier Transform was set for full mass scan, and the LTQ was set up for data-dependent tandem MS (MS/MS) fragmentation. The temperature of the ion transfer tube was controlled at 200 °C and the spray voltage was 2.0 kV. The normalized collision energy was set at 35% for MS/MS. Data-dependent ion selection was monitored to select the most abundant 7 ions from an MS scan for MS/MS analysis. Dynamic exclusion was continued for 2 min.
bioinformatics
Peptide sequences were identified with the Sequest algorithm (Version C1) incorporated in BioWorks software (Version 3.1 SR, ThermoElectron), and the spectra were searched against the human Rapido (November 2005) database. Only peptides resulting from tryptic cleavage were searched. To generate a large pool of candidate peptides we used the following search parameters: Xcorr
1.9, 2.2, 3.75 for 1+, 2+, 3+ charged ions;
Cn
0.1; and Rsp
4.
The MS/MS spectra of key peptide markers were further confirmed by manual interpretation of the cleavage pattern as well as inspection of the number of assigned b and y fragments and the presence of a high signal-to-noise ratio. An example of this process is depicted in Fig. 1
, which shows the MS/MS spectra attributed to the peptide 144SLTEILKGGVLIQR157, presenting in the extracellular domain of the surface protein HER-2.
|
The identification of key peptides could be missed by MS/MS scan because of the time constraints of a flowing system in which peptides are selected for fragmentation on the basis of abundance. To avoid possible false negatives, the m/z signal of the peptide with a mass window of 0.5 atomic mass units (amu) was selected for an extracted ion chromatogram (EIC). The presence of the peptide in the sample was considered to be confirmed by the observation of an EIC peak (signal-to-noise ratio, >5) within 0.5 min of the same retention time at which the peptide was detected, an approach similar to the use of accurate mass tags (12).
comparative quantification by peak area measurement
Among the peptides identified in a given protein, we selected the one identified with highest confidence as a diagnostic peptide. For proteins identified with multiple peptides, we used additional peptides for peak area measurement and the mean for comparative quantification. The m/z signals of peptides were extracted from the liquid chromatography (LC)-MS EIC, and the peaks were integrated at the appropriate retention times. Each LC-MS/MS assay time was 2 h, and the reproducibility of between-assay retention times was achieved with an SD of 0.23 min. To monitor the variation between sample preparation and LC-MS analysis, we added standard bovine fetuin (5% of the total protein content of the sample) as an internal standard. The internal standard was digested together with the sample, and the peptides were separated and detected by LC-MS/MS. For normalization, we found the mean of the peak areas of 5 peptides with retention times covering the range at which the majority of the peptides were eluted. After normalization to eliminate the differential in ionization efficiency, the relative SD of these 5 peak areas was <30%.
use of gene ontology terms to search for biomarker candidates
The list of proteins identified in the pooled control, DCIS, and IC samples (659 proteins identified with 3 or more peptides and 154 proteins with 2 peptides) were submitted to the FuncAssociate on-line program (http://llama.med.harvard.edu/cgi/func/FuncAssociate) for Gene Ontology (GO; March 2005) attribute characterization. This program groups related proteins on the basis of GO attributes. The GO attributes associated with the queue of the gene group were listed and ranked according to the significance of the association. We compared the rankings of GO attributes (P <0.001 based on Fisher exact test) of control, DCIS, and invasive pools, and we selected the GO attributes with significant rank differences (arbitrarily set at >10) as target associations. We used peak area measurements to comparatively quantify the proteins associated with these attributes, selecting the proteins showing the greatest changes between pooled control and disease samples, with changes having the same trend as the attribute rankings. Then, we used peak area measurements to quantify these proteins in individual patient samples, found the mean of the normalized peak areas, and calculated the SDs. To measure the protein concentration difference between disease and control samples, we normalized all peak area values relative to the mean control values.
her-2 elisa in breast cancer sera
We measured HER-2/ECD (extracellular domain) with a sandwich immunoassay (Oncogene Science, Inc). In brief, 200 µL of serum were incubated in a well coated with the capture antibody, reacting with the detector antiserum. The amount of detector antibody bound to antigen was measured by streptavidin/horseradish peroxidase conjugate. All samples were analyzed in duplicate. The interassay CV was 3.1%.
| Results |
|---|
|
|
|---|
-2-macroglobulin, serotransferrin, haptoglobin, hemopexin,
1-antitrypsin, complement C3, apolipoprotein A-I,
-1-acid glycoprotein, ceruloplasmin, and complement factor H. Because of the depletion of abundant nonglycosylated proteins, especially serum albumin, any given glycoprotein marker was identified with a higher sequence coverage than in the nondepleted serum samples (see Table 2
-1-antichymotrypsin and ceruloplasmin, the quality of identification (higher sequence coverage) was similarly improved by the M-LAC enrichment step (data not shown). These observations were consistent for the analysis of both pooled and individual samples.
|
identification of a known low-abundance biomarker, her-2
To investigate the dynamic range of our proteomic study and to assess the performance of the M-LAC depletion procedure, we evaluated the preliminary identification of a low-abundance breast cancer biomarker in individual serum samples collected from patients with breast cancer. The HER-2 peptide (residues 144157) was identified multiple times by MS/MS sequencing in the 10 breast cancer samples (nonpooled). The results of an MS/MS analysis performed on the most abundant 3+ charge state of the peptide [caused by improved ionization and fragmentation of a peptide with charged internal residues (13)(14)] are shown in Fig. 1
. Similar findings were obtained when analysis was performed on the 2+ ion.
comparative quantification by normalized peak area (with an internal standard)
Comparative quantification approaches, such as stable isotope labeling or the incorporation of mass-tagged derivatives, (15)(16), have been used to compare protein concentrations between samples. These approaches are less suitable for studies on serum samples because isotope labels cannot be incorporated into human samples and because increased side reactions occur in the chemical derivatization of low-abundance peptides (13). For these reasons, we used the measurement of peak area of a diagnostic peptide in an EIC to determine relative protein abundance in different samples. This approach involved normalization for variations in sample preparation and in peptide ionization in HPLC-MS measurement by adding a known amount of a calibration protein to each sample at the beginning of the isolation protocol. This approach was used for the analysis of all individual and pooled samples. To minimize interference between the sample and the internal standard in peptides peak area measurement, we selected bovine fetuin as the internal standard because it is glycosylated and contains many unique tryptic peptides relative to human proteins. The tryptic peptides generated from this internal standard elute over a range of retention times that cover the organic modifier concentrations used in the separation. Use of a range of peptide standards enables monitoring for any system fluctuations during the 2-h HPLC separation process. Furthermore, with a complex sample such as serum, not all low-abundance peptides can be consistently identified in a given LC-MS analysis, owing to the time constraints of a flowing system in which peptides are selected for fragmentation on the basis of abundance. In such situations, however, selected peptides can still be detected and the peak area measured in EICs at a predetermined mass and retention time window.
analysis of pooled sample sets and of go attributes
The proteomics analysis of several serum samples rapidly produces a large data set, which then becomes an informatics challenge, particularly if the analysis depends on the study of lists of unrelated proteins. In this study, therefore, we used analysis of GO-based associations to group protein identifications and decreased the complexity of the data sets for comparison. Results of triplicate analysis of the pools of control and disease samples for proteins selected to represent the GO categories are shown in Table 3
. Variability of the peak area measurement was <30% (Table 3
). Although it is a concern that pools of 5 alternative individuals may show different values because of biological variability (17), we used the pools in the following manner. We used pooled samples to perform several replicates, estimate the variability of the analytical measurement, and establish a baseline between control and disease samples. Assessment of true biological variability would require a much larger sample set and will be the focus of future investigations. Some markers showed 24-fold changes (increased or decreased) in the IC relative to the DCIS and control samples. When multiple peptides were suitable for peak area quantification, the area measurement was shown to be consistent in replicates (Table 3
). An additional criterion for protein representative selection in a given ontological association was availability of literature suggesting an association with breast cancer.
|
| Discussion |
|---|
|
|
|---|
50 g/L), the HER-2 marker is present at a concentration <1/106 (by weight). Using a similar LC-MS analysis, we previously measured a dynamic range of 1 in 101 for a standard protein (growth hormone) added to plasma (13); thus, the M-LAC step represents a marked improvement in depth of analysis.
go annotation
Our approach, which is based on the understanding that the serum glycoproteome originates from a wide range of tissues and proteins can be annotated to various biological pathways (19), clusters glycoproteins on the basis of their GO attributes to allow direct comparison of the ranking of such groupings in control and disease samples. A concern in the use of these associations is that some GO assignments are based on computerized sequence similarity searches on data that are not well curated. We minimize such errors in the database by performing our own curation of proteins of interest that met our threshold for differential production between the sample groups. An additional benefit of such a focused analysis is a marked decrease in the number of proteins to be examined relative to the initial list of identified proteins.
The study revealed that the following ontology-based associations showed consistent differences in the ranks of attributes between the control and disease sets:
Although the exact number of identified proteins in a given attribute is a function of the dynamic range of the triplicate measurements, this variable can serve as another valuable discriminator between the control and disease samples. For example, in a total of 18 identified proteins associated with actin-binding function, 11 were found only in the disease samples; similarly, 49 of 125 identified proteins involved in cell growth and maintenance were also found only in the disease samples (Table 3
). As described in "Results", an increased ranking of ontology associations of interest was also supported by quantitative measurement of key protein members (Table 3
). Examples of appreciable up-regulation in IC samples included ceruloplasmin, pregnancy-zone protein, and
-1-antichymotrypsin; examples of down-regulation included fibrinogen ß-chain and neuropilin-1. Peak-area changes were consistent with changes in the corresponding ranking of the GO attributes. Proteins listed in Table 3
range from higher-abundance blood proteins, such as apolipoprotein C-III, to lower-abundance proteins, such as pregnancy-zone protein; in all cases, protein identification was based on 2 or more peptides.
When we repeated the analysis with 15 individual patient samples to assess the variability in abundance of the proteins listed in Table 3
in different individuals and to examine the overlap between the control and disease (DCIS and IC) samples (Fig. 2
), the results for disease samples revealed an increasing trend for protein groups ae (apolipoprotein C-III, ceruloplasmin, pregnancy zone protein,
-1-antichymotrypsin, and prothrombin) and a decreasing trend for protein groups f and g (neuropilin-1 and fibrinogen ß-chain). As expected, values for various control and patient samples overlapped somewhat, and, in general, the DCIS samples showed smaller differences from the controls than did the IC samples. The results were consistent with the results for the pooled samples.
|
evaluation of potential biomarkers with disease and patient information
Cancer is a complex set of diseases that can lead to systemic changes (20)(21). Serum proteomic detection of early cancer would be enhanced by the identification of specific metabolic or immunological alterations that do not depend on the presence of notable tumor mass (22).
This study provides evidence that serum samples from patients with cancer show increases in specific GO associations, including lipid metabolism and transport, cell ion homeostasis, and wide-spectrum protease inhibitors. Other changes in metabolism are indicated by decreases in proteins in the coagulation and fibrinogen complex category.
Tumor growth and metastasis have been associated with high-fat diet and specific metabolic control by dietary substrates (23)(24). Clinical findings have supported these observations; for example, studies have demonstrated increased concentrations of apolipoprotein B (25) and apolipoprotein C III (26) in patients with breast cancer. For the purposes of this study, we selected apolipoprotein C-III as an example of increased serum lipid transport proteins in disease samples; however, other apolipoproteins, such as Lp(a), gave similar results. Ceruloplasmin is another example of an upregulated transport protein in the current study, and this marker was also found to be increased in another study of metastatic breast cancer (27)(28).
The significantly higher serum concentrations of protease inhibitors (Table 3
and Fig. 2
) are in agreement with other studies of breast cancer (29). One study showed that
-1-antichymotrypsin was synthesized by MCR-7 breast cancer cells (30)(31) and that synthesis was stimulated by epidermal growth factor (32) and insulin-like growth factors (33), which may protect the tumor from invading leukocytes (34)(35).
Patients with cancer exhibit changes in coagulation and fibrinolysis, (36) and increased serum concentrations of prothrombin (37). Thrombin induces production of vascular endothelial growth factor mRNA and is a survival factor for malignant cells (38). In addition to increased prothrombin in breast cancer sera reported here (Table 3
), other proteins within the same ontological association group showed similar changes, including transthyretin and plasminogen. Neuropilin was decreased in the disease samples (Table 3
and Fig. 2
), a finding consistent with an observation of deficient concentrations in breast carcinoma cells (39). Neuropilin-1 exerts an apoptotic effects in endothelial cells (40) and activation of a p53-dependent apoptotic pathway, (41)(42).
For any given disease, an individual protein marker may be suitable for only a portion of the patient population (43), and a panel of multiple biomarkers may be required to control bias in disease diagnosis. Such a panel increases the complexity of subsequent data analysis, a problem further magnified in genomic and proteomic studies, that discover large numbers of potential protein markers (44). Such complexity may be decreased by grouping potential biomarkers by GO terms (45)(46).
Our results indicate that isolation of the serum glycoproteome by M-LAC coupled with LC-MS/MS and the use of GO associations facilitates characterization of a large panel of candidate markers, which can then be evaluated in a particular patient population. We hope to stimulate additional interest in glycosylation and further development of technology for characterizing this important posttranslation modification. Although it was not the focus of this study, we developed the M-LAC approach to monitor shifts in glycosylation by elution of the bound glycoproteins with 3 separate displacer steps (47). Larger studies are required to assess the functional importance of changes seen in different proteins and pathways. We and others (48) believe that MS has great potential in the search for multiple novel disease-associated biomarkers, enabling identification of large panels of proteins not readily accessible by current ELISA technology.
| Acknowledgments |
|---|
| Footnotes |
|---|
| References |
|---|
|
|
|---|
1-antichymotrypsin and
1-antitrypsin by human trophoblast. Pediatr Res 1993;34:312-317.[ISI][Medline]
[Order article via Infotrieve]
1-Antitrypsin- and anchorage-independent growth of MCF-7 breast cancer cells. Endocrinology 1993;133:996-1002.[Abstract]
1-antichymotrypsin and soluble IGF-II/mannose 6-phosphate receptor from MCF7 breast cancer cells. Endocrinology 1995;136:3759-3766.[Abstract]
1-antichymotrypsin and
1-acid glycoprotein by human breast epithelial cells. Cancer Res 1982;42:4567-4573.
1-antichymotrypsin, by human breast epithelial cells. Biochim Biophys Acta 1986;882:242-253.[Medline]
[Order article via Infotrieve]The following articles in journals at HighWire Press have cited this article:
![]() |
Z. Kyselova, Y. Mechref, P. Kang, J. A. Goetz, L. E. Dobrolecki, G. W. Sledge, L. Schnaper, R. J. Hickey, L. H. Malkas, and M. V. Novotny Breast Cancer Diagnosis and Prognosis through Quantitative Measurements of Serum Glycan Profiles Clin. Chem., July 1, 2008; 54(7): 1166 - 1175. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Harris, H. Fritsche, R. Mennel, L. Norton, P. Ravdin, S. Taube, M. R. Somerfield, D. F. Hayes, and R. C. Bast Jr American Society of Clinical Oncology 2007 Update of Recommendations for the Use of Tumor Markers in Breast Cancer J. Clin. Oncol., November 20, 2007; 25(33): 5287 - 5312. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |