|
|
||||||||
Oak Ridge Conference |
1 Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD.
2 Center for Applied Proteomics and Molecular Medicine, George Mason University, Manassas, VA.
3 Tufts University School of Medicine, Howard Hughes Medical Institute, Boston MA.
4 National Ovarian Cancer Early Detection Program, New York University, New York, NY.
5 National Cancer Institute Biomedical Proteomics Program, SAIC, NCI, Frederick, MD.
aAddress correspondence to Dr. Petricoin or Dr. Liotta at: Center for Applied Proteomics and Molecular Medicine, George Mason University, Manassas, VA 20110. E-mail lliotta{at}gmu.edu or epetrico{at}gmu.edu.
Abstract
Background: Albumin binds lowmolecular-weight molecules, including proteins and peptides, which then acquire its longer half-life, thereby protecting the bound species from kidney clearance. We developed an experimental method to isolate albumin in its native state and to then identify [mass spectrometry (MS) sequencing] the corresponding bound lowmolecular-weight molecules. We used this method to analyze pooled sera from a human disease study set (high-risk persons without cancer, n= 40; stage I ovarian cancer, n = 30; stage III ovarian cancer, n = 40) to demonstrate the feasibility of this approach as a discovery method.
Methods: Albumin was isolated by solid-phase affinity capture under native binding and washing conditions. Captured albumin-associated proteins and peptides were separated by gel electrophoresis and subjected to iterative MS sequencing by microcapillary reversed-phase tandem MS. Selected albumin-bound protein fragments were confirmed in human sera by Western blotting and immunocompetition.
Results: In total, 1208 individual protein sequences were predicted from all 3 pools. The predicted sequences were largely fragments derived from proteins with diverse biological functions. More than one third of these fragments were identified by multiple peptide sequences, and more than one half of the identified species were in vivo cleavage products of parent proteins. An estimated 700 serum peptides or proteins were predicted that had not been reported in previous serum databases. Several proteolytic fragments of larger molecules that may be cancer-related were confirmed immunologically in blood by Western blotting and peptide immunocompetition. BRCA2, a 390-kDa low-abundance nuclear protein linked to cancer susceptibility, was represented in sera as a series of specific fragments bound to albumin.
Conclusion: Carrier-protein harvesting provides a rich source of candidate peptides and proteins with potential diverse tissue and cellular origins that may reflect important disease-related information.
The circulatory proteome holds great promise as a reservoir of information useful for disease detection and therapeutic monitoring (1). Despite this potential, comprehensive characterization of the circulatory proteome is difficult because of the wide dynamic range of protein concentrations that exist between larger molecules such as albumin (g/L) and the sought after biomarkers (below ng/L) (2). Here we describe a method for amplifying the yield of low-abundance, lowmolecular-weight proteins and peptide fragments in serum as a means of providing a new window into the information content contained within the circulation. This method takes advantage of the tendency for low-abundance molecules to associate with high-abundance proteins such as albumin and thereby acquire the carriers longevity in the serum (3). This report describes the application of this method to identify candidate proteins and peptides in serum pools from diseased and control ovarian cancer patients.
Current protocols for investigation of the serum proteome often recommend prefractionation by native depletion of high-abundance species (e.g., albumin, immunoglobulins, lipoproteins) (4)(5)(6)(7)(8). This approach can inadvertently remove a high percentage of candidate low-abundance proteins and peptides because most lowmolecular-weight molecules in serum form complexes with high-abundance proteins, which protect the bound molecules from clearance and allow them to remain in circulation. Albumin is the most abundant protein in plasma and serum, present at 50 g/L, and has a half-life of 19 days in humans (9)(10)(11). Because the kidney generally and efficiently filters out molecules <60 kDa, smaller proteins and peptides will be protected from clearance by association with albumin (9)(10). Plasma protein binding can be an effective means of extending the pharmacokinetic properties of otherwise short-lived molecules (9)(11)(12)(13). Dennis et al. (9) demonstrated that a 58-fold increase in peptide longevity could be attained through albumin binding and association. Albumin binding of biologically important proteins and peptides is well documented. For example, the amino-terminal peptide of HIV-1, gp41, and the 14-kDa fragment of streptococcal protein G are known to specifically bind with human serum albumin (14)(15). Vitamin A transport and homeostasis is controlled by a specific proteinprotein interaction between the 21-kDa retinol-binding protein and transthyretin, which was shown to reduce glomerular filtration of the lowmolecular-mass retinol-binding protein (16).
It has recently been shown that most lowmolecular-weight molecules that have been visualized by mass spectrometry (MS) 1 profiling methods in the past exist under native conditions in a complexed state with highmolecular-weight proteins (17)(18). Direct analysis of the lowmolecular-weight portion of size-fractionated native serum revealed few unbound molecular species. These previous studies (3)(17)(18) demonstrated that most of the lowmolecular-weight species were found in the highmolecular-weight fraction because they were all complexed to other proteins. These same investigators went on to denature the protein complexes and then sequenced the size-fractionated lowmolecular-weight repository. The selectivity of binding of lowmolecular-weight constituents for different classes of carrier proteins was explored further (17)(18), revealing that subsets of bound lowmolecular-weight peptides and sequenced protein fragments associated with albumin were different from those bound to immunoglobulin, apolipoprotein, and transferrin (18). These previous studies established the scientific rationale for the present investigation into the carrier-proteinbound information archive. To evaluate the possibility that this repository of bound molecules contains potential disease-related information, we report the results of an investigation comparing the albumin-associated peptides in serum study sets derived from patients with pathologically documented ovarian cancer vs healthy but high-risk patients who had been disease-free for 5 years after serum collection.
Two key properties of carrier proteins enable them to function as efficient harvesters of lower-abundance putative biomarkers in the circulation: overwhelming relative abundance and long half-lives compared with their cargo. Considering the binding reaction between a biomarker (B) and its carrier protein (C):
![]() |
Thus, the rate of formation of the biomarkercarrier- protein complex (BC) is proportional to the product of the biomarker concentration [B] and the carrier-protein concentration [C], with a constant of proportionality kf. The dissociation of the biomarkercarrier-protein complex into free biomarker and carrier protein, on the other hand, is proportional to the concentration of the bound biomarker [BC], with a constant of proportionality kr. The forward and reverse reaction rates are given by:
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Moreover, the relatively long half-life of the carrier protein [
19 days for albumin compared with a few hours or less for an unbound molecule (9)(22)(23)(24)(25)] facilitates the prolonged presence of smaller peptides and proteins in the bloodstream. Mehta et al. (3) showed that if a protein is secreted into the bloodstream at a constant rate p, and is subsequently eliminated through the kidneys at a rate proportional to its concentration (with a constant of proportionality of e), its steady-state concentration in the circulation is given by the ratio p/e. In the absence of carrier-protein binding, the steady-state concentration of biomarker in the circulation would be pb/eb, where pb and eb are the production and elimination rates of the biomarker, respectively. When bound to a carrier protein, however, the biomarker acquires the carrier proteins longer half-life, so that the steady-state concentration becomes pb/ec, where ec is the relatively small elimination rate of the carrier protein. Because the elimination rate of a protein is inversely proportional to its halflife, amplification of the biomarkers concentration in the bloodstream as a result of carrier-protein binding is eb/ec = t(1/2)c/t(1/2)b, where t(1/2)c and t(1/2)b are the half-lives of the carrier protein and biomarker, respectively. Thus, the longer the half-life of the carrier protein, the greater is its ability to amplify the concentration of bound species in the bloodstream. In this way, the relative abundance of large carrier proteins and their longer half-lives work together to increase the total concentration of complexed molecules in the bloodstream. This report describes the application of the carrier-protein sequestration principle to the isolation and mass spectrometric identification of candidate lowmolecular-weight proteins and peptides that exist in the serum of ovarian cancer patient disease study sets.
Materials and Methods
clinical serum samples
Serum samples were collected under full patient consent and Institutional Review Board approval. Serum was collected before physical evaluation, diagnosis, and treatment and stored at 80 °C. The ovarian study set consisted of 40 samples from unaffected, high-risk patient, 30 samples from patients with stage I ovarian cancer, and 40 samples from patients with stage III ovarian cancer. The gynecologic oncology clinic where the samples from the cases were collected is located in a building that is separate but contiguously linked to the building in which the National Ovarian Cancer Early Detection Program at Northwestern University is located, where the high-risk control samples were collected. A special attribute of this sample set is that the same personnel were involved in the blood collection, handling, and storage of all biospecimens. In addition, all blood specimens were processed in an identical manner under the same methodology. Healthy control sera were collected from unaffected women determined to be at increased risk for ovarian cancer. These women were enrolled in the National Ovarian Cancer Early Detection Program and had no evidence of any cancer for 5 years as determined by twice yearly 3-dimensional color Doppler ultrasound and extensive clinical evaluation by a board-certified gynecologic oncologist.
Increased risk was determined by classic genetic pedigree analysis and by the presence or absence of factors such as BRCA mutation status. The determination of increased risk was based on commercially available computer-generated risk algorithm programs such as BRCA Pro. All patients in the high-risk clinic were seen by board-certified genetic counselors and geneticists who defined the risk. Similarly, the serum specimens from women with ovarian cancer were procured in a gynecologic oncology clinic from symptomatic women who were later surgically staged and found to have epithelial ovarian carcinoma. Each sample was accompanied by a verified pathology diagnosis. Briefly, specimens were collected in red-top Vacutainer Tubes and allowed to clot for 1 h on ice, followed by centrifugation at 4 °C for 10 min at 1210g. The serum supernatant was divided in aliquots and stored at80 °C until needed. Samples were selected for our analysis by a random process categorized by the pathology diagnosis as cancer or benign.
capture of native serum albumin and elution of complexed protein species
A schematic representation of the basic experimental technique is illustrated in Fig. 1
. Native, diluted serum is introduced into an affinity column so that the carrier protein (albumin) is captured along with any bound molecules. The bound subproteome consisting of the carrier proteins and their peptide "cargo" is eluted, dissociated, and separated by 1-dimensional gel electrophoresis. Each entire gel lane is cut out, finely subdivided into molecular mass regions, subjected to in-gel trypsin digestion, and prepared for electrospray mass spectrometric analysis.
|
purification of albumin and bound peptide
Typically, 25 µL of human stage-specific (pooled) cancer serum (
3.1 mg of protein) was diluted to 200 µL with Equilibration Buffer (Millipore) and run through a (Montage) albumin-specific affinity column twice. The bound protein was washed thoroughly with two 200-µL volumes of proprietary wash buffer (provided by the manufacturer). These fractions were combined and labeled as a "flow-through" fraction. The bound proteins were eluted from the column by equilibrating with acetonitrileH2Otrifluoroacetic acid (70:30:0.2 by volume) for 30 min, followed by a slow spin-through of the elution mixture, repeated once. The eluate (retentate fraction) was lyophilized to <10 µL in a HetoVac roto (CT 110) and reconstituted in an H2Oacetonitrileformic acid (95:5:0.1 by volume) buffer. Samples were desalted with a ZipTip cleanup and reconstituted in a 1:1 mixture of water and sodium dodecyl sulfate sample buffer (20 µL total volume).
1-dimensional protein gel separation and digestion
The flow-through and retentate fractions were kept on ice in 20 µL of sample buffer from 25 µL of original serum, and then were heated for 5 min at 95 °C and loaded on 1-dimensional precast gels to separate albumin from the proteins/peptides/fragments of interest. The proteins and fragments were visualized with a Gel Code Blue Stain Reagent (Pierce) according to the manufacturers protocols. The entire lane was excised from the gel and finely sliced into very small molecular-weight regions (
35 slices/lane). Gel bands were reduced, alkylated, and digested with porcine modified trypsin according to a standard protocol (26), and peptides were concentrated and prepped for mass spectrometric analysis.
microcapillary reversed-phase tandem ms
Samples were lyophilized to near dryness and reconstituted in 6.3 µL of Buffer A (H2Oacetonitrileformic acid; 95:5:0.5 by volume) for MS analysis. Microcapillary reversed-phase tandem MS (µLC-MS/MS) analysis was performed with a Dionex LC Packings liquid chromatography system coupled on-line to a ThermoFinnigan LCQ Classic ion trap mass spectrometer with a modified nanospray source. Reversed-phase separations were performed on an in-house, slurry-packed capillary column. The C18 silica-bonded column was a 10-cm long (75-µm i.d.) fused-silica column packed with 5-µm beads (pore size, 300 Å; Vydac). A PepMap C18 cartridge (5-mm; Dionex) acted as a desalting column. Sample was injected in microliter pick-up mode and washed with Buffer A for 5 min before elution with a linear gradient with buffer B (acetonitrileH2Oformic acid; 95:5:0.1 by volume) up to 85% over 95 min at a flow rate of 200 nL/min. Full MS scans were followed by 4 MS/MS scans of the most abundant peptide ions (in a data-dependent mode), and collision-induced dissociation was performed at a collision energy of 38% with the ion spray voltage set to 1.80 kV, capillary voltage set to 22.80 V and temperature set to 180 °C.
data analysis and repetitive sequencing
Data analysis was performed by searching MS/MS spectra against the European Bioinformatics Institute of the nonredundant proteome set of Swiss-Prot, TrEMBL, and Ensembl entries through the Sequest Bioworks Browser (ThermoFinnigan), with a static modification of +57 Da on cysteine residues and a dynamic modification for oxidation of methionine of +15.9994 Da. Peptides were considered legitimate hits after the correlation scores (refer below) were filtered and the MS/MS data were manually inspected. The criteria used to filter data in this report are at least as stringent as most literature citations (17)(18)(27)(28)(29)(30):
![]() |
validation by serum western blotting
The primary antibody that recognized BRCA2 was synthesized in house. Rabbits were immunized with a peptide corresponding to an exact antigenic region of BRCA2, and the resulting polyclonal anti-BRCA2 antibody was affinity-purified (see below). The specificity of the antibody was verified against the full-length (390 kDa) BRCA2 protein extracted from HeLa cell nuclear extract. Subsequent preincubation of the primary antibody with an immunizing synthetic peptide overlapping the antigenic region of interest successfully competed away the representative band of native BRCA2 at 390 kDa. After verification of the specificities of the antibody and competition peptide, this experimental procedure was applied to pooled ovarian cancer and control serum samples.
Prepared serum samples were heated for 5 min at 95 °C in sample buffer containing 20 mL/L ß-mercaptoethanol, followed by centrifugation at 10 000g for 1 min to remove insoluble material. Samples were then subjected to 1dimensional electrophoresis and electroblotting at 30 V for 2 h on ice. Membranes were incubated overnight at 4 °C in 50 g/L nonfat dry milk, 75 g/L glycine, and 1 mL/L Tween 20 in water to block unoccupied protein binding sites.
The blocked membranes were rinsed twice with wash buffer [10 mmol/L Tris (pH 7.5), 150 mmol/L NaCl, 1 g/L bovine serum albumin, 1 mL/L Tween 20] and then incubated with 1 mg/L primary antibody in wash buffer containing 50 g/L nonfat dry milk, with rocking, for 2 h at room temperature. For peptide blocking/competition assays, 10 µg of primary antibody was incubated with 100 µg of the corresponding immunization peptide in 400 µL of wash buffer for 1 h at room temperature with end-over-end mixing. The peptide-treated antibody solution was diluted to 10 mL (1 mg/L final antibody concentration) in wash buffer containing 50 g/L nonfat dry milk before incubation with polyvinylidene difluoride (PVDF) membrane.
The membranes were washed 5 times (3 min each) in 50 mL of wash buffer and subsequently incubated in 10 mL of horseradish peroxidaseconjugated goat anti-rabbit IgG (1:50 000 in wash buffer) for 1 h at room temperature. After the PVDF membranes were washed thoroughly, signals were developed by enhanced chemiluminescence.
peptide-specific antibodies
A peptide representing amino acid residues 980993 (DKIPEKNNDYMNKW) of the BRCA2 sequence was synthesized (Anaspec) and conjugated to keyhole limpet hemocyanin for immunization as described previously (31). The resulting antisera were affinity-purified over columns of peptides conjugated to Affigel 15 (Bio-Rad) and concentrated in stirred cells with YM-30 membranes (Millipore). The concentrates were subjected to gel-filtration chromatography on 2.6 x 60 cm Superdex 200 columns (GE Healthcare) in phosphate-buffered saline, and the monomeric IgG fractions were pooled and concentrated. The protein concentrations were determined by the Bradford assay (Bio-Rad).
Results
The ovarian cancer study set was divided according to disease category; serum samples within each disease category were pooled into sets. A total of 110 samples were classified based on pathology into high-risk (n = 40), stage I (n = 30), and stage III (n = 40) ovarian cancer pools, and 5 separate aliquots per disease stage were iteratively sequenced by the experimental procedure in Fig. 1
. A total of 1208 unique proteins were predicted in all 3 pools, 446 of these from multiple peptide sequences. An iterative sequencing approach examines the repetitive yield and variability between runs and between stage classifications. The aggregate yield of low-abundance protein identifications is expected to increase with repeated iterations of the experimental method. The correlation between the number of sequencing iterations performed and the total number of peptide sequences, and corresponding protein identifications obtained are described in Table 1
. Overall, the number of unique proteins identified by multiple peptide sequences increased at a diminishing rate relative to the number of iterations performed. The rates of single and multiple peptide hits accumulated vs the number of experimental iterations are shown in Fig. 2
. The ability to identify new peptides with multiple hits began to diminish by the third iteration (Table 1
and Fig. 2
); however, the total number of new peptide identifications and single hit identifications continued to increase even after 5 iterations. Previous work by Liu et al. (32) revealed that saturation of new peptide identification occurs at around the 10th iteration. The results presented here support the conclusion that greater coverage of lower-abundance proteins can be achieved by increasing the number of experiments performed on a given sample.
|
|
We have identified >700 different tryptic peptides derived from proteins not previously reported to exist in the blood in published databases from sera from women with various stages of ovarian cancer; many of these proteins are of low abundance (see Tables 1a1c of the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol51/issue10/). More than 100 proteins falling into putative functional categories (Fig. 3
) previously known to be related to cancer were identified by single or multiple peptide hits. In this study, unique single peptide hits were often discovered more than once from the same disease category (Table 1b of the online Data Supplement). That is, the same single peptide hit was generated more than once from different aliquots of the same disease category.
|
Low-abundance proteins or peptides derived from tissues entering the serum compartment can become complexed with high abundance proteins (3)(9)(10)(11)(12)(13)(14)(15)(16)(17)(32). At least one half of all the proteins identified, which were bound to albumin and thus sequenced by the present method, must exist as peptide fragments of whole proteins. This is supported by two points of evidence: (a) the peptides were sequenced from a molecular-weight region of a gel that did not correspond to the predicted mass of the intact protein; and (b) passive diffusion of protein species through the vascular walls is hindered above apparent molecular masses of 60 kDa (25). Thus, large tissue proteins can be represented in the blood circulation only if they are actively secreted or if they are in vivo cleavage products of the parent protein. BRCA2, which was sequenced by LC-MS/MS and represented by 4 distinct peptides (example BRCA2 spectra from 2 iteratively found peptides are presented in Fig. 4
) and subsequently validated by competition Western blotting (Fig. 5C
), is not predicted to be in the blood circulation as an intact protein. In fact, the parent protein with a known molecular mass of 390 kDa was not found by serum Western blotting (Fig. 5D
). Shown in Fig. 5C
are immunocompetition results for 2 cleavage products of BRCA2 (
12 and 25 kDa) that were identified by Western blot analysis of pooled ovarian cancer sera. A rabbit was immunized with a synthetic peptide representing amino acid residues 980993 (DKIPEKNNDYMNKW) of the BRCA2 protein to produce a polyclonal antibody that ultimately detected the BRCA2 fragments in serum. Competition with the cognate peptide completely extinguished the 12- and 25-kDa bands. The estimated masses of the immuno-fragments (12 and 25 kDa) represent <9% by mass of the total parent protein. Furthermore, 3 other additional and distinct peptides were sequenced (residues 19421959, 23902401, and 24472459) in a separate region of the parent protein. The close proximity of the peptides containing amino acids 23902401 and 24472459 suggests that they likely originate from a single serum fragment of BRCA2 (Fig. 5
, A and B).
|
|
Candidate proteins bound to albumin (Table 1
; see also Tables 1a1c of the online Data Supplement) are apparently diverse in origin and span a wide variety of predicted physiologic functions (Fig. 3
). The breadth of biological and functional classes ranges from defense and immunity to transcription regulation, apoptosis, and cell adhesion. More than one half of the predicted proteins that have been sequenced have no known function reported in the literature. The relative abundances of the different functional classes of those proteins found in ovarian cancer samples are shown in Fig. 3
. The diversity of categories represented is characteristic of a typical serum proteome (23)(26). The ovarian cancer serum proteins that have been identified are listed in Tables 1a1c of the online Data Supplement, and are shown classified as either multiple-peptide or single-peptide identifications. A sequence for CA125, a known ovarian cancer marker (33)(34), was predicted to exist in the stage III ovarian cancer pool, but it was not found in either the stage I or the high-risk serum pools.
Discussion
The data presented here indicate that high-abundance serum carrier proteins such as albumin may act to sequester lowmolecular-weight peptide fragments in the blood. Such sequestered peptides may provide a potentially rich source of candidate disease-associated biomarkers for subsequent clinical validation and provide a new opportunity to expand the knowledge base for the molecular composition of the circulation. Although the present work emphasizes the use of albumin, nonalbumin carrier proteins, such as IgG or fibronectin, are associated with a distinct subproteome within human sera (23). The method of albumin capture, elution of bound peptide and protein fragments, fractionation, digestion, and MS sequencing was repeated on 5 independent aliquots of each disease category serum pool. Because shown in Table 1
, this iterative-sequencing approach provided a means to assess sequence discovery reproducibility within the same serum pool sample. Iterative sample sequencing indicated that after 5 rounds of iterations, the yield of novel multiple-peptide predictions not identified in previous iterations appeared to increase more slowly. This would be expected if the peptide sequences are being randomly resampled from the same large population of limited original candidates.
In theory, a similar list of protein fragments could be generated by denaturing and size-fractionating of sera because albumin fractionation and isolation would occur first. This is, in fact, exactly what previous investigators did in their analyses (17)(18). However, because many current biomarker efforts begin by depletion of native albumin (as well as depletion of other high-abundance proteins such as immunoglobulins), the findings of this study begin to more fully describe the molecular information that is being lost by this method of sample preparation. Moreover, direct analysis of the albumin-bound material allows for clearer understanding of the nature and existence of the carrier-proteinbound lowmolecular-weight information archive while minimizing issues of small amounts of free/unbound material contaminating the analysis.
The total list of identified albumin-bound entities are shown in Tables 1a1c of the online Data Supplement (http://www.clinchem.org/content/vol51/issue10/). This list reveals a rich and previously undescribed information archive of putative analytes, many of which had not been described as being represented in the circulation. These molecules are mostly fragments of larger proteins; thus, their validation will necessitate the development of new immunodetection reagents such as anti-peptide antibodies.
It is well known that highly abundant proteins are identified by a larger number of peptides than are lower-abundance proteins. The probability of selecting and identifying a peptide from a low-abundance, and potentially more interesting, protein is therefore much less. The inherent complexity and large dynamic range of protein concentrations of global proteome samples represents a considerable barrier to obtaining multiple peptide identifications for each protein. The large number of peptides present in such mixtures greatly exceeds the capacity of current data-dependent tandem mass spectrometers, even when multidimensional fractionation is used (4)(35)(36). In the present method we have endeavored to reduce this barrier by isolating the major high-abundance protein albumin and then examining the identity of its bound species.
Predicted sequences according to the criteria outlined in the Materials and Methods section fell into 2 categories: (a) multiple peptide hits, and (b) single peptide hits. Multiple peptide hits according to disease pool category are presented in Table 1a of the online Data Supplement by protein identity. Single peptide hits are listed in Table 1b and 1c of the online Data Supplement, along with the corresponding specific predicted sequences. Low-abundance proteins in blood are statistically likely to be discovered by only a single peptide sequence as a function of the dynamic range of proteins in serum; consequently, their probability of detection is low. Furthermore, many of the predicted serum species are peptide fragments of larger parent proteins. This can be concluded because a high percentage of the predicted proteins in Tables 1a1c of the online Data Supplement have a full-length mass larger than albumin. Peptide fragments have reduced numbers of trypsin domains compared with the intact parent protein. Thus, a single peptide prediction, if correct, can correspond to a low-abundance peptide or a protein fragment with a reduced number of trypsin domains compared with the parent molecule. Because we conducted 5 independent isolation and sequence determinations for each disease category, single hits that were discovered more than once could be tabulated (Table 1b of the online Data Supplement). Single peptide hits discovered more than once may have a higher probability of being valid serum biomarkers than do single-instance single peptide hits.
The relevance of single peptide hits has been addressed recently (37). These investigators pointed out that the elimination of single peptide hits from experimental results is scientifically unwarranted and detrimental to the field in general (37). In fact, when so-called "one-hit wonders" identified by isotope-coded affinity tag proteomics were later assessed for validation by immunoassay detection, >90% were found to retain the differential expression state determined by the initial MS-based assay. Moreover, in the present study, many of the same single-hit tryptic fragments were iteratively and reproducibly found in the same disease category. Finally, for small, low-abundance fragments (with a low number of trypsin cleavage sites), it is statistically likely that only a single hit would be obtained in any given experimental cycle.
Although we identified 4 separate fragments for BRCA2, which increased our confidence in the finding, the fact that a majority of our protein identifications were based on one-hit MS analysis does not diminish their potential for potential validation. It is well known that highly abundant proteins are identified by a larger number of peptides than are lower-abundance proteins. The probability of selecting and identifying a peptide from a low abundance, and potentially more interesting, protein is therefore much less. The inherent complexity and large dynamic range of protein concentrations of global proteome samples represents a considerable barrier to obtaining multiple-peptide identifications for each protein. The large number of peptides present in such mixtures greatly exceeds the capacity of current data-dependent tandem mass spectrometers, even when multidimensional fractionation is used. Moreover, global proteomic studies have consistently shown a low false-positive rate (i.e., <5%) for peptide identification; therefore; for every incorrect one-hit wonder that is thrown out,
19 correct identifications are also excluded. Excluding one-hit wonders from possible validation would decimate quantitative proteomics studies that use isotope-coded affinity tag reagents and phosphoproteomic experiments, which typically produce only single phosphopeptide identifications. Many investigations have performed orthogonal validation experiments (e.g., Western analysis) that have confirmed that these qualitative and quantitative proteomic results forthcoming from one-hit wonders are indeed valid identifications (27)(38)(39)(40)(41)(42).
Ultimately all predicted peptide sequences, whether single or multiple, must remain designated as "candidate" components of the human serum until they are confirmed immunologically. Many of the predicted sequences were derived from molecular-weight fractions smaller than the predicted size of the protein. We therefore concluded that the sequenced species existed as a fragment of a larger parent molecule. Thus, the gel migration location of the band could not be used to identify the protein. Western blotting with peptide competition was performed on the selected fragments identified by MS as belonging to BRCA2 to verify the identity of selected mass spectrometric identifications (Fig. 5C
).
BRCA2 is a well-studied tumor suppressor protein related to the p53 pathway that is directly implicated in familial breast cancer and ovarian cancer (43)(44)(45)(46). Although the role of BRCA2 in breast and ovarian cancer predisposition is poorly understood, it is known that mutations of the BRCA2 gene are responsible for one third of hereditary breast cancer cases (45). The data in Fig. 5
suggest that at least 2 fragments of the BRCA2 parent protein exist in blood. A single predicted trypsin-cleaved peptide at amino acids 965981 was identified in both stage I and stage III ovarian cancer serum pools (Fig. 4A
and Fig. 5A
). This peptide is represented in competition Western blots by 2 distinct molecular-weight bands (Fig. 5C
). It is likely that a fragment containing the peptide at amino acids 965981 would be cleaved at various residues in vivo and could therefore be represented at multiple molecular weights. This peptide was not predicted in pooled high-risk ovarian cancer serum by MS analysis. On the basis of evidence from the Western blots using a BRCA2 polyclonal antibody and the fact that BRCA2 is too large (390 kDa) to enter the blood circulation in its native form, we conclude that BRCA2 can be represented in the serum as one or more fragments and that at least 2 amino-terminal peptide fragments of BRCA2 <25 kDa can be validated by peptide competition for antibodies that recognize an amino acid sequence adjacent to, and overlapping with, the predicted peptide sequence. Additionally, 2 separate BRCA2 fragments encompassing no less than the amino acid region from 23902459 and 19421959 (Fig. 4B
) are predicted to exist in the serum from mass spectral evidence of multiple peptide sequences identified in this region of BRCA2.
The current methodology applies carrier-protein sequestering to a disease category pool of combined sera. The methodology described here appears to be a successful means to identify candidate proteins and peptides in the sera of patients with known disease states. On the basis of the predicted physiologic functions of the parental protein containing the peptide sequence, these proteins and peptides can potentially be derived from tumor cells, from the host microenvironment, or from interactions between these two tissue compartments. Each pool contained multiple serum specimens procured before pathology-based diagnosis. The rationale for using a pool of multiple serum samples is based on the conservative assumption that even within the same histopathologic diagnosis, cancer is a heterogeneous disease. Thus, a biologically relevant molecule may be expressed only within a subfraction of the cancer population. Pooling samples is statistically the best means to identify a list of candidate peptides and proteins that exist within subsets of the pooled population.
The first disadvantage of using a pool of sera to discover markers is that molecules present in a subset of the combined samples are diluted within the pool. Although an individual sample may have a high concentration of the putative protein or peptide, this sample is diluted within the entire pool. Consequently, this averaging effect will lower the concentration of any individual molecule in the pool before sequence analysis. Lastly, a candidate protein that is identified in one disease pool (e.g., high risk) but not found in a second disease category (e.g., stage I cancer) may still exist in the second category at a concentration below a threshold for statistical sampling probability. Currently we are performing competition Western blots on a large set of serum samples from individual patients diagnosed as either high risk or with stage I, III, or IV ovarian cancer to determine the extent by which pooling samples limits the effectiveness of MS sequencing detection. Despite these drawbacks, as shown in Table 1
and in Tables 1a1c of the online Data Supplement, carrier-protein sequestration appears to yield a list of predicted sequences derived from proteins predicted in the literature to have diverse physiologic functions.
The predicted sequences presented in Tables 1a1c of the online Data Supplement demonstrate a rich diversity of peptides and protein fragments in the sera. Although the application of this method yielded different sets of predicted sequences within the control and diseased category pools, these proteins and peptides cannot be considered as candidate diagnostic markers. Once high-throughput means are developed to quantitatively measure each protein and peptide and accurately distinguish specific biomarker fragments from their parent molecules, proper clinical trial studies can examine the diagnostic sensitivity and specificity of a selected panel of molecules.
In conclusion, proteins and peptides associated with serum carrier proteins such as albumin may constitute a rich source of new additions to the human serum proteome. The sequences identified by this means are predicted to be derived from proteins that have been shown in the literature to serve a wide variety of physiologic functions. We can hypothesize that these complexed proteins and peptides may have originated from a range of different tissues and cellular compartments. A large proportion of the predicted sequences represent fragments of larger molecules. A subset of these candidates may eventually be found suitable for full clinical diagnostic testing in adequately powered objective study sets.
Footnotes
1 Nonstandard abbreviations: MS, mass spectrometry; µLC-MS/MS microcapillary reversed-phase tandem MS; and PVDF, polyvinylidene difluoride. ![]()
References
The following articles in journals at HighWire Press have cited this article:
![]() |
C. Norez, M. Pasetto, M. C. Dechecchi, E. Barison, C. Anselmi, A. Tamanini, F. Quiri, L. Cattel, P. Rizzotti, F. Dosio, et al. Chemical conjugation of {Delta}F508-CFTR corrector deoxyspergualin to transporter human serum albumin enhances its ability to rescue Cl- channel functions Am J Physiol Lung Cell Mol Physiol, August 1, 2008; 295(2): L336 - L347. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Belluco, E. F. Petricoin, E. Mammano, F. Facchiano, S. Ross-Rucker, D. Nitti, C. Di Maggio, C. Liu, M. Lise, L. A. Liotta, et al. Serum Proteomic Analysis Identifies a Highly Sensitive and Specific Discriminatory Pattern in Stage 1 Breast Cancer Ann. Surg. Oncol., September 1, 2007; 14(9): 2470 - 2476. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P. Diamandis Oncopeptidomics: A Useful Approach for Cancer Diagnosis? Clin. Chem., June 1, 2007; 53(6): 1004 - 1006. [Full Text] [PDF] |
||||
![]() |
M. F. Lopez, A. Mikulskis, S. Kuzdzal, E. Golenko, E. F. Petricoin III, L. A. Liotta, W. F. Patton, G. R. Whiteley, K. Rosenblatt, P. Gurnani, et al. A Novel, High-Throughput Workflow for Discovery and Identification of Serum Carrier Protein-Bound Peptide Biomarker Candidates in Ovarian Cancer Samples Clin. Chem., June 1, 2007; 53(6): 1067 - 1074. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Mosley, F. W. K. Tam, R. J. Edwards, J. Crozier, C. D. Pusey, and L. Lightstone Urinary proteomic profiles distinguish between active and inactive lupus nephritis Rheumatology, December 1, 2006; 45(12): 1497 - 1504. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Kazmierczak, A. Gurachevsky, G. Matthes, and V. Muravsky Electron Spin Resonance Spectroscopy of Serum Albumin: A Novel New Test for Cancer Diagnosis and Monitoring Clin. Chem., November 1, 2006; 52(11): 2129 - 2134. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. L. Hortin The MALDI-TOF Mass Spectrometric View of the Plasma Proteome and Peptidome Clin. Chem., July 1, 2006; 52(7): 1223 - 1237. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. L. Hortin, S. A. Jortani, J. C. Ritchie Jr, R. Valdes Jr, and D. W. Chan Proteomics: A New Diagnostic Frontier Clin. Chem., July 1, 2006; 52(7): 1218 - 1222. [Abstract] [Full Text] [PDF] |
||||
![]() |