Clinical Chemistry 48: 1160-1169, 2002;
(Clinical Chemistry. 2002;48:1160-1169.)
© 2002 American Association for Clinical Chemistry, Inc.
Proteomics for Cancer Biomarker Discovery
Pothur R. Srinivas1,
Mukesh Verma2,
Yinming Zhao3 and
Sudhir Srivastava2a
1 Division of Cancer Prevention, National Cancer Institute, Rockville, MD 20852.
2 Cancer Biomarkers Research Group, Division of Cancer Prevention, National Cancer Institute, Rockville, MD 20852.
3 Department of Biochemistry, The University of Southwestern Medical Center at Dallas, 5323 Harry Hines Blvd., Dallas, TX 75390.
aAddress correspondence to this author at: Cancer Biomarkers Research Group, Division of Cancer Prevention, National Cancer Institute, 6130 Executive Blvd., Room EPN 330F, Rockville, MD 20852. Fax 301-402-0816; e-mail ss1a{at}nih.gov.
 |
Abstract
|
|---|
The emergence of novel technologies allows researchers to facilitate the comprehensive analyses of genomes, transcriptomes, and proteomes in health and disease. The information that is expected from such technologies may soon exert a dramatic change in the pace of cancer research and impact dramatically on the care of cancer patients. These approaches have already demonstrated the power of molecular medicine in discriminating among disease subtypes that are not recognizable by traditional pathologic criteria and in identifying specific genetic events involved in cancer progression. This review covers a selection of advances in the realm of proteomics and its promise for cancer biomarker discovery. It also addresses issues regarding sample preparation and specificity and discusses current challenges that need to be overcome. Finally, the review touches on the efforts of the Early Detection Research Network at the National Cancer Institute in promoting biomarker discovery for translation at the clinical level.
 |
Introduction
|
|---|
Cancer remains a major public health challenge despite progress in detection and therapy. A large portion of the US population will develop cancer during their lifetime (1), with
500 000 individuals dying annually from the disease (2). The race to obtain control over the disease process is gaining speed and focus. From biotechnology to chemistry, from applied physics to software, increasing resources are being brought to bear on the goals of prevention and reducing mortality. Innovations and applications of biotechnology have allowed the exploitation of biological processes in an effort to study pathogenesis at the molecular level. Novel technologies that are designed to advance the molecular analyses of healthy and diseased human cells are poised to revolutionize the field of health and disease. Advances in the fields of genomics and proteomics are hoped to provide insights into the molecular complexity of the disease process and thus enable the development of tools to help in treatment as well as in detection and prevention.
Among the important tools critical to detection, diagnosis, treatment, monitoring, and prognosis are biomarkers. Biomarkers are biological molecules that are indicators of physiologic state and also of change during a disease process (3). The utility of a biomarker lies in its ability to provide an early indication of the disease, to monitor disease progression, to provide ease of detection, and to provide a factor measurable across populations. The initial draft of the human genome (4)(5) has set the pace for biomarker discovery and provided the impetus for the next level of molecular inquiry, which is represented by functional genomics or proteomics. Proteomics is the study of the complete protein complement, or the proteome of the cell. In contrast to the genome, the proteome is dynamic and is in constant flux because of a combination of factors. These include differential splicing of the respective mRNAs, posttranslational modifications, and temporal and functional regulation of gene expression (6). Proteomic technologies allow for identification of the protein changes caused by the disease process in a relatively accurate manner. The inherent advantage afforded to proteomics is that the identified protein is itself the biological endpoint. At the protein level, distinct changes occur during the transformation of a healthy cell into a neoplastic cell, including altered expression, differential protein modification, changes in specific activity, and aberrant localization, all of which may affect cellular function. Identifying and understanding these changes is the underlying theme in cancer proteomics (3).
 |
Proteomic Technologies
|
|---|
Genomics-based approaches to biomarker development include the measurements of expression of full sets of mRNA, such as differential display (7)(8), serial analysis of gene expression (9)(10), and large-scale gene expression arrays. However, interpreting the best data and adapting the results to a particular application remain challenging. Although studies of differential mRNA expression are informative, they do not always correlate with protein concentrations (11)(12). Proteins are often subject to proteolytic cleavage or posttranslational modifications, such as phosphorylation or glycosylation. Cancer biomarker discovery strategies that target expressed proteins are becoming increasingly popular because proteomic approaches characterize the proteins, modified or unmodified, involved in cancer progression.
Two-dimensional gel electrophoresis has been the mainstay of electrophoretic technology for a decade and is the most widely used tool for separating proteins. Initially described 25 years ago (13), proteins in a two-dimensional gel are separated in the first dimension based on their isoelectric points and then in a second dimension based on their molecular masses. In many cases, two-dimensional gel electrophoresis may evaluate whole-cell or tissue protein extracts. The use of narrow, immobilized pH gradients for the first dimension increases resolving power and can help detect low-abundance proteins. Radioactive or fluorescent labeling and silver staining allow visualization of hundreds of proteins in a single gel. Differences between the samples can be compared and relative quantities determined by quantifying the ratios of spot intensities in independent two-dimensional gels. Matrix-assisted desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS)1
allows the analysis and identification of very small amounts of protein isolated from the gel (14)(15)(16)(17)(18). These advances have combined to make two-dimensional electrophoresis a more attractive option for the analysis of complex protein mixtures. A brief overview of complementary and other rapidly evolving proteomic technologies is provided below.
 |
Isotope-coded Affinity Tags
|
|---|
Utilizing the power of MS, a high-throughput technique has been developed that facilitates direct qualitative and quantitative comparisons of complex protein mixtures. This method, analogous to the microarray approach for assessing differential gene expression between two cell states, uses a chemical group or label made in two different isotopic forms: heavy and light. Originally described by Gygi et al. (19), these labels, termed isotope-coded affinity tags, coupled to all the cysteine residues in a protein mixture. The heavy isotope is added to one sample (e.g., cancer cells), whereas the lighter isotope is added to the second sample (healthy cells). The samples are then combined, the proteins are digested, and the peptides are analyzed in a mass spectrometer. The isotopic substitutions do not affect the behavior of peptides during separations. Thus, peptides derived from the two different samples enter the spectrometer at the same time. The mass spectrometer measures the relative abundance of heavy and light peptide forms in each sample and identifies each peptide by generating and analyzing the peptide fingerprints. In this manner, a global view of protein abundance in cells or tissues in two different states can be determined. This is especially important to biomarker discovery because expression analysis of all proteins and identification of the changes that are a function of a disease process give us molecular handles to target intervention strategies.
 |
MALDI
|
|---|
Conventionally, Edman degradation has been used to obtain partial amino acid sequences for protein identification and the design of an oligonucleotide for gene cloning (20)(21). This method requires substantial amounts of materials, and its low sensitivity makes it difficult to sequence those regulatory proteins involved in cancer progression, which are often present in low abundance. Therefore, biochemical studies of cancer biology require more sensitive tools. MALDI has helped to establish the MS platform as an important tool in proteomics. This technique of ionization has been instrumental in bringing the mass spectrometer, which measures the mass of a molecule, to the forefront of proteomics research. MALDI enables conversion of biomolecules into a charged gaseous state that is essential for analysis by the mass spectrometer.
Developed by Karas and Hillenkamp (22), the procedure involves the precipitation of the sample molecules with an excess of matrix material, such as
-cyano-4-hydroxycinnamic acid or dihydroxybenzoic acid (23). The precipitated solid is then irradiated with laser pulses, and the matrix material imparts energy to the biomolecules. The matrix materials have absorbances at the wavelength of the laser, and the molecules are subjected to a process of desorption and ionization accompanied by fragmentation. The mass spectrometer then measures the mass-to-charge ratio (m/z) of the protein, peptide, or peptide fragments. Mass separation can be achieved based on the TOF, generation of quadrupole electric fields, or ejection of ions from an ion trap. MALDI can be linked to any one of the above three methods. In the MALDI-TOF MS platform, irradiation by laser pulses produces short bursts of ions that are then accelerated through a flight tube, the smaller ions possessing a higher velocity relative to the larger ions. The velocities are detected, creating a TOF spectrum. The upstream elements of protein purification have the most impact on the output from an MS platform. Analysis is usually carried out after enzymatic degradation of gel-separated proteins. In other instances, intact proteins are digested and analyzed as complex peptide mixtures, without electrophoresis.
The molecular masses of peptides derived from the digestion of a specific target represent the fingerprint or protein profile. With an appropriate protein sequence database and a search program (PepFrag or ProFound; available at http://prowl.rockefeller.edu/), mass information can be used for protein identification (24)(25)(26). Alternatively, the masses of peptide fragments can be compared with those of theoretical peptide fragments specified by the parent peptide mass.
In a state-of-the-art MS laboratory, the current detection limits for accurate protein identification are in the low subpicomole range. The major limiting step in protein identification is the requirement that a protein sequence be present in a protein database. Although only a small percentage of known human proteins have been sequenced, there are
1.2 million human expressed sequence tag sequences present in public database(s). However, differential protein profiles could serve as biomarkers themselves, distinguishing specific states or stages of disease progression.
The power of MS is leading to the development of integrated systems that use robots to automatically pick specified spots on a two-dimensional gel. After enzymatic digestion, samples are processed for analysis through MALDI-TOF and MS. The drawback of this method lies in the low sensitivity for detection of low-abundance proteins in a two-dimensional spot. The specified spot may contain multiple components, and the signals generated, through peptide digests, by ions of low abundance are lost in this approach (27). This is critically important in the detection of posttranslationally modified species that, under most circumstances, are present in extremely low amounts. Furthermore, not all proteins can be identified through peptide mass mapping alone because many proteins remain to be represented full length in sequence databases. Often, small proteins do not generate sufficient numbers of digestion products for identification. In many cases, additional MS sequencing of peptides is necessary to gain knowledge about sequence identity. The detection of different proteins through liquid chromatography coupled with tandem MS (LC-MS/MS) has been demonstrated to resolve this problem to some extent.
 |
LC-MS/MS
|
|---|
The strength of the tandem mass spectrometer lies in the ability to sequence information for a specific peptide in the presence of multiple peptides in the sample. Reversed-phase LC is used to concentrate and separate peptides from extremely complex mixtures before sequencing by MS (18)(23). LC-MS/MS has been demonstrated to enhance separation and identification of analytes in the low femtomolar range (16). The LC-MS/MS system integrates conventional HPLC pumps and columns coupled to a tandem mass spectrometer. The pumps and mass spectrometer are controlled by the same software to allow efficient coupling of chromatography and ion detection (28). Samples can be introduced either through an injector or a pressure cell to enhance sensitivity. Additional methods of loading include the use of precolumn traps and vented microcapillary columns (28). After their elution from the HPLC columns, peptides are ionized by electrospray ionization and analyzed by a tandem mass spectrometer. The peptide ions are sequentially ejected from an ion trap to the detector based on their m/z. A selected peptide of interest can be followed by an initial ejection of ions, with other m/z values, from the ion trap. The retained peptide ion is fragmented and then sent to the detector to obtain a tandem mass spectrum containing the sequence information (28). The fragmented ions allow for identification of the respective amino acids through a protein or translated nucleotide database search (26). A single LC-MS/MS analysis thus offers the possibility of isolating and sequencing hundreds of selected peptides from one sample. An additional advantage of this technique is the ability to identify protein targets of reactive electrophiles and map the adducts to the specific amino acid (29). This approach may open up avenues for novel biomarker discovery and help researchers to understand the mechanisms through which environmental exposures may initiate carcinogenesis through protein modifications. The LC-MS/MS platform has in fact been used for monitoring congenital adrenal hyperplasia in a rapid manner from dried filter-paper blood samples (30).
 |
Imaging MS
|
|---|
This method takes advantage of the molecular sensitivity of MS to analyze protein expression profiles from target tissues. It allows for direct mapping of peptides and proteins present on the surface of tissue sections and individual cells (31). In a typical procedure, steel plates coated with matrix solution are used to mount tissue sections. They are then dried and introduced into a mass spectrometer controlled by tailored imaging software (32)(33). Molecular images are then generated from a raster over the sample surface with the aid of laser spots. Laser positions are fixed, and the sample plate is repositioned for consecutive spots. Mass spectral data are acquired for each spot from the molecules present on the irradiated area. A typical data array could contain thousands of spots, depending on the desired image resolution and a specified molecular weight range. A composite of >200 protein peaks can be detected in the mass spectrum from each spot ablated by the laser (32). It is possible to generate hundreds of image maps at distinct molecular weights from a single raster. Imaging MS may help enhance biomarker discovery by shedding light on the importance of spatial localization of specific proteins during carcinogenesis and neoplasia.
 |
Free Flow Electrophoresis
|
|---|
Although two-dimensional electrophoresis is a powerful separation technique, it is limited by the insolubility of certain classes of proteins, such as hydrophobic membrane proteins, as well as the amount of protein that can be processed. A recently described procedure for resolving complex mixtures of proteins is targeted at overcoming the above limitation. This technique is a combination of free flow electrophoresis, liquid-based isoelectrofocusing, and sodium dodecyl sulfatepolyacrylamide gel electrophoresis (34). Resolved proteins are identified by peptide fragment sequencing using capillary column, reversed-phase HPLC-MS. Free flow electrophoresis is an extremely powerful liquid-based isoelectrofocusing method for resolving proteins and is not limited by the amount of sample that can be loaded onto the instrument. It is capable of fractionating intact protein complexes and is helpful in cell-mapping proteomics. It also allows for the generation of protein profiles through the use of various denaturing isoelectrofocusing buffers containing a combination of urea-thiourea and zwitterionic detergents.
 |
Protein Chips
|
|---|
The advent of chip-based gene technology has allowed for the detection of thousands of genes from very small samples. This success has researchers in pursuit of similar chip-based technologies for proteins. Functional protein microarrays seem to be the next big challenge in proteomics. Several hurdles must be overcome in this process. Proteins are not as robust as DNA and tend to denature rapidly under harsh experimental conditions. Unlike DNA, they are hard to attach to chip surfaces. There is no method similar to PCR, which amplifies DNA, currently available to amplify minute amounts of protein. The development of capture molecules capable of binding all possible proteins and the respective posttranslationally modified forms poses unique challenges to the proteomics community. Furthermore, the complex biochemistries of the various proteins make it quite difficult to capture the myriad classes of proteins within a cell using a single protein chip. These dilemmas are being addressed by the research community in various ways to produce protein chips that can parallel the gene array format.
A variety of protein and peptide arrays being tested at the current time serve to answer specific questions about a particular protein or its variants (35). Several groups have been using high-throughput chip-based analysis to study proteinprotein, proteinDNA, and proteinRNA interactions. Typically, a specific protein is spotted through cross-linking to a chip surface, such as a glass slide, in a grid-like fashion, and samples are passed over the slide for detection of interacting molecules. Houseman et al. (36) have developed a chip that has immobilized on its surface the substrate of the tyrosine kinase c-src. The functional status of c-src is assessed through the phosphorylation of the substrate, which in turn can be characterized through fluorescence and phosphorimaging. Peptide chips are useful in studying enzymatic processes, using peptides as model substrates. A disadvantage of this method is that it may not reflect the in vivo situation where protein complexes determine the activity of the protein.
Another emerging paradigm in protein chip technology involves the surface capture of proteins through antibodies, followed by MS. The mass information generates a fingerprint that can be compared with catalogs in a database. A chip can potentially have thousands of addressable locations designed for protein capture in a high-throughput fashion. Chip surfaces can be manipulated in various ways to enhance optimum capture associated with a disease stage or type (36).
 |
Surface-enhanced Laser Desorption/Ionization
|
|---|
Affinity-based MS techniques present a novel proteomic approach for the identification and measurement of cancer-associated biomarkers. On the basis of the work of Hutchens and Yip (37), Ciphergen Biosystems, Inc. has developed the surface-enhanced laser desorption/ionization (SELDI) ProteinChip® MS technology platform, which brings to the field of proteomics a user-friendly methodology (38)(39). SELDI has several advantages over other existing technologies, such as LC-MS, two-dimensional gel electrophoresis-coupled MS, ELISA, and fluorescent-based binding assays, for high-throughput screening because of its versatility, ease of use, speed, and low cost. It is rapid, reproducible, highly sensitive (detection limit in the femtomolar range), and readily adaptable to a diagnostic format (40). Additionally, molecules that have been traditionally difficult to identify have been detected with ease by use of the SELDI platform (41). Through the use of addressable protein binding sites, the SELDI platform provides a layer of specificity when one is probing for biomarkers along defined signaling pathways or specific posttranslationally modified protein species.
Xiao et al. (42) used SELDI successfully to quantify prostate-specific membrane antigen (PSMA) from serum for differential diagnosis of prostate cancer. PSMA has been suggested as a biomarker for malignant prostate disease and was observed to be differentially expressed in benign prostatic hyperplasia (BPH) and prostate cancer (43)(44)(45)(46). This holds great promise for identifying patients with BPH and reducing unnecessary biopsies, and also in more sensitive identification of prostate cancer patients whose prostate-specific antigen concentrations fall in the gray zone (42). Using samples from a serum bank and utilizing SELDI as an immunoassay platform, Wrights group demonstrated that PSMA in combination with prostate-specific antigen could discriminate BPH patient from prostate cancer patients (42). In addition, these researchers have undertaken a comprehensive approach for biomarker discovery in prostate cancer from cells and body fluids through use of the SELDI platform (47)(48)(49).
 |
Tissue Arrays
|
|---|
Large-scale scanning of the human genome has become possible with the introduction of the DNA microarray. The ability to survey the expression of up to 500050 000 genes in a single experiment provides valuable new opportunities as well as new challenges. It is important to be able to translate genome-scale information on cancer biology to functional and clinical applications. This requires prioritization of hundreds of discovered targets, functional validation of these targets, and a thorough knowledge of the involvement of the candidate target genes in vivo in human tissue. Tissue array technologies are being developed for genome-scale expressional and clinical cancer research (50). This technology enables high-throughput molecular analysis of large numbers of specimens. A typical tissue array is constructed by arranging cylindrical biopsies from multiple individual tumor tissues into a tissue array block, which is then sliced into
200 identical slides for probing RNA or protein targets. A single immunohistochemistry or in situ hybridization experiment provides information on all specimens on the slides, whereas subsequent sections can be analyzed with other probes or antibodies. Cancer-specific tissue array slides with various kinds of subsets can be generated (subsequent cancer cases, preneoplastic lesions, metastatic lesions, synchronous cancers, metachronous cancers, young patients, and familial cases) for further analysis (50). Tissue array technology has expanded the scope of high-throughput molecular analysis of archival tissue specimens with multiple probes for specific genes or proteins for functional and clinical applications.
 |
Sample Preparation
|
|---|
One of the most critical components in proteomics is sample preparation. This is important because it may affect reproducibility as a result of the heterogeneity of proteins derived from cell populations (51). From the time of sample collection to the point when proteins are introduced for analysis, multiple factors come into play. Efforts should be made to process all the samples in as similar a manner as possible. Cancer cells should be rendered free of stroma and necrotic tissue and contaminating serum proteins and blood cells to ensure that findings directly relate to the tumor in question (52). Mechanical methods such as surface scrapping and fine- needle aspiration have been successfully used for capturing cancer cells (53). Calcium starvation and other nonenzymatic methods, such as immunomagnetic separation, have also been used to isolate pure populations of cancer cells (54)(55). An important advance in sample preparation has been the development of laser capture microdissection (LCM) (56). The LCM system enables one-step procurement of selected cell populations from a section of complex and heterogeneous tissue sample. LCM has been successfully used to obtain pure populations of cancer cells from frozen, paraffin-embedded, stained, and unstained tissues for molecular analysis. LCM allows for the isolation of specific cell populations, thus making it possible to procure pure populations of neoplastic cells from lesions <1 mm in diameter without encroachment of adjacent nonneoplastic cells. With this technique, matched healthy epithelial, stromal, benign, preneoplastic, neoplastic, and cancer cells from the same specimen have been successfully isolated, which holds promise for novel cancer biomarker discovery.
 |
Specificity
|
|---|
Protein markers can be used in detection, diagnosis, monitoring of therapy, and ultimately, prevention and risk assessment. Given the complex nature of neoplasia, the best approach to prevention may be to screen for a cluster of markers from blood and tissue to give a more accurate assessment. Both mRNA and protein expression profiles are necessary to infer cell behavior, suggesting that more than one marker is needed for risk assessment or early detection. Because the progression to cancer is a complex process, multiple alterations have to be targeted to achieve efficient detection. A major concern in screening is the issue of false positives when a single biomarker is considered. Ideally, high specificity is favored to avoid unnecessary diagnostic tests on healthy individuals. As novel technologies unravel the mysteries of cellular mechanisms, multiple tools will become available to detect, target, and manipulate the process of neoplasia. The use of a panel of biomarkers would enhance the positive predictive value of a test and minimize false positives or false negatives. In an epidemiologic perspective, cancer progresses through two distinct phases after the point of biological onset (3). The preclinical phase spans the interval from the point of onset to the time when symptoms appear. The more visible clinical phase encompasses the time from when symptoms appear through the time of therapy. Early detection lies in the preclinical phase of this continuum, and biomarkers predictive of this phase hold the greatest promise in helping to design effective interventions to stop or reverse progression (3).
Many issues have to be addressed in parallel concerning the specificity of the protein biomarkers that are discovered. Studies would have to be directed at their efficacy between genders and among races and ethnic groups. The ability of biomarkers to identify interindividual differences in susceptibility for monitoring high-risk groups should be assessed. In addition, the new biomarkers should be able to increase our understanding of the neoplastic process and help identify harmful exposures. Specificity should be also measured through preliminary work on model systems to infer whether a marker is a part of disease pathogenesis and not merely part of an adaptive response. The effect of other comorbid conditions on the specificity of a particular biomarker should also be assessed. To be of public health value, the assay should adaptable to a high-throughput format with minimal misclassification.
 |
Bioinformatics
|
|---|
Bioinformatic tools are integral components of proteomic analyses because the development of proteomic technologies incorporating high-throughput methods relies on powerful data analysis tools. The handling and analysis of the type of data to be collected in proteomic investigations are forcing new collaborations among computer scientists, biostatisticians, and biologists. Bioinformatic tools are necessary for multiple components of proteomic investigations, including analysis, storage, management, search, and retrieval. Bioinformatic tools are being developed around current proteomic platforms of two-dimensional gel electrophoresis, MS, and arrays. Tools developed to analyze two-dimensional electrophoresis protein patterns incorporate software applications possessing user-friendly interfaces for linearization and merging of scanned images, segmentation and detection of protein spots on the images, matching, and editing (57). Commercial two-dimensional image analysis software available to the proteomics community include ImageMasterTM (Amersham Biosciences), Melanie 3 (GeneBio), ProgenesisTM (Nonlinear Dynamics), PDQuestTM 2-D Analysis Software (Bio-Rad Laboratories), ProteinMineTM (Scimagix), and the Z3 2D-Gel Analysis System (Compugen Limited). Some of the software packages can interact with automatic robotic systems to excise spots of interest from the gel for subsequent MS analysis. Interfacing the image analysis software with database tools for storing images is of critical importance in proteomic research, but is a shortfall of many commercial products (58).
Software tools such as RADARS (Rapid Automated Data Archiving and Retrieval Software) are beginning to address this issue (59). The RADARS system automatically initiates database searches for protein identification from raw data files in addition to storing the processed data and search results in a relational database (59). Current methods of analysis of MS data include peptide mass fingerprinting or peptide mass analysis (60)(61)(62), peptide sequence tag query (61)(62)(63), and MS/MSion search analysis (61)(64). Protein samples processed through MALDI-TOF MS can be identified in real time by a simultaneous search of a sequence database. Free software tools for database searches for protein identification are available through the internet. These include Mowse (United Kingdom Human Genome Mapping Project Resource Center), MS-FIT (University of California at San Francisco), PeptideSearch (European Molecular Biology Laboratory), and MS-TAG (University of California at San Francisco). Algorithms are being developed to help determine amino acid sequences of peptide fragments de novo from MS/MS data (65).
Protein data derived from two-dimensional electrophoretic analysis have been used to develop artificial learning models to help classify tumors into benign, borderline, and malignant (66). Statistical algorithms such as partial least squares and hierarchical clustering have been used to that effect. Algorithms are being developed that can cluster and distinguish cancers from healthy tissue samples through the use of training sets that contain protein profile spectra derived from the technologies described above. Recently, such a method was used to distinguish ovarian cancer patients from unaffected individuals with a sensitivity of 100% and a specificity of 95% (67). Bioinformatic tools are in development to help handle, process, and meaningfully interpret the large body of data that is emerging from advances in proteomic technologies. These tools are essential in the discovery of sensitive and specific biomarkers in cancer research.
 |
Challenges
|
|---|
The global study of proteins has many unique difficulties that set it apart from comprehensive studies of genes and transcripts. One difficulty is that the behavior of proteins is determined by the tertiary structure of the molecule. Any assay based on protein binding depends on maintaining the native conformation of the protein. This puts constraints on the systems used to capture protein targets in affinity-based assays. Another difficulty is that the detection of low-abundance proteins poses a particular challenge, given that the dynamic ranges of proteins in biological systems can reach parts per million or lower. An amplification system analogous to the PCR has yet to be developed for protein studies. Protein regulation is often based not on synthesis and degradation, but on reversible modifications such as phosphorylation and glycosylation. In addition, RNA splicing can produce splice variants that are highly homologous but differ in function. The proteomics revolution hopefully will advance to overcome many of these hurdles and provide us with information and biomarkers that can aid in intervention and the prevention of cancer.
 |
Perspective
|
|---|
Now that the draft of the human genome has been completed, the field of proteomics is ramping up to tackle the vast protein networks that both control and are controlled by the information encoded by the genome (68). The study of proteomics should yield an unparalleled understanding of cancer as well as invaluable new targets for therapeutic intervention and markers for early detection. This rapidly expanding field attempts to track the protein interactions responsible for all cellular processes. Through careful analysis of these systems, a detailed understanding of the molecular causes and consequences of cancer could emerge. Soon, cellular protein networks will be understood at a level that will permit a totally new paradigm of diagnosis and will allow therapy tailored to individual patients and situations.
As new protein biomarkers are discovered through proteomic approaches, the necessity to validate and ultimately use them in a clinical setting increases. This can be done only as a collaborative effort among the research communities. The National Cancer Institute (US) has taken a lead role in this regard by creating the Early Detection Research Network (EDRN). This network brings together national and international experts from academia and industry to promote biomarker discovery and validation and to help translation into clinical practice. The EDRN thus serves as an integrated platform designed to accelerate translation of discovery into tools for early detection and risk assessment. A five-phase criterion for the development and evaluation of biomarkers has been established by the network (69). The first phase is a preclinical exploratory phase to help identify promising directions. Next is a clinical assay and validation phase necessary to evaluate the ability of the assay to detect established disease. The third is a retrospective/longitudinal phase to determine the putative biomarkers ability to detect preclinical disease and to define a "screen positive" rule. In the fourth phase, prospective screening is developed to identify the extent and characteristics of disease detected by the test and the false-positive rate. In the last phase, a definitive trial is designed (prospective randomized trial) to determine the impact of screening on reducing the burden of disease in the general population.
Many of the above-mentioned technologies are currently in use by EDRN investigators for detection of novel cancer biomarkers. Two of these are depicted in Figs. 1
and 2
. Fig. 1
is a comparison of two-dimensional gel electrophoresis protein profiles from healthy and cancer cells from the same esophageal tumor. Cell populations were obtained through LCM (70) and subjected to two-dimensional differential in-gel electrophoresis (71). Fig. 2A
represents the MS output for a protein spot excised from a two-dimensional gel that was subjected to in-gel digestion. Fig. 2B
represents the MS/MS output for the peptide ion fragments. A search of the National Center for Biotechnology Information nonredundancy protein sequence, using the obtained masses of the peptide fragments, identified a peptide sequence, GALQNIIPASTGAAK, which is unique to glyceraldehyde 3-phosphate dehydrogenase.

View larger version (79K):
[in this window]
[in a new window]
|
Figure 1. Two-dimensional gel images of healthy and cancer cells.
A total of 250 000 matched healthy (Normal) and cancer cells were obtained from the same individual with esophageal tumor. Cells were lysed, labeled, and separated by differential in-gel electrophoresis. The left panel represents the Cy3 image of proteins from healthy cells, and the right panel represents the Cy5 image of proteins from the cancer cells. M, molecular markers.
|
|

View larger version (25K):
[in this window]
[in a new window]
|
Figure 2. Example of protein identification by MS.
In panel A, a protein spot from a two-dimensional gel is excised, in-gel digested by trypsin, and analyzed by HPLC-MS/MS in a LCQ mass spectrometer. The x axis represents the retention time of the analysis, and the y axis represents ion current detected. The instrument is tuned such that the masses of both the parent peptide and fragments of the strongest tryptic peptide are determined. Panel B represents the MS/MS output of the peptide ions with m/z 707 at the retention time of 33.5 min. The peptide was isolated and fragmented in the LCQ, leading to mass information of the fragments. Database search revealed sequence identity to glyceraldehyde 3-phosphate dehydrogenase. b and y indicate the N- and C-terminal fragments, respectively, of the peptide produced by breakage at the peptide bond in the LCQ. The subscripted numbers represent the residue numbers from either the NH2 or COOH terminus.
|
|
Diverse methodologies and approaches are in use by the EDRN investigators in their pursuit of novel biomarkers. Investigations are in progress on the epigenetic mechanisms of hypermethylation, using a panel of genes as a marker for early disease in lung and other cancers. Efforts are also underway to detect membrane and secreted proteins in breast cancer through novel signal sequence trapping approaches. Studies are also underway to determine the utility of mitochondrial DNA mutations as markers of early detection. There is also an effort to build a two-dimensional gel database of lung-specific proteins from human lung samples to help in the detection of lung cancer. EDRN investigators are also involved in the search for prostate-specific markers, through the use of the SELDI platform, to help in the early detection of prostate cancer. Through the use of microarray analysis, investigators within the network have identified osteopontin as a potential diagnostic biomarker for ovarian cancer (72). Exciting information is expected to emerge from the EDRNs collaborative effort that can ultimately be applied to population screening for risk assessment, early detection, and diagnosis of cancer.
 |
Footnotes
|
|---|
1 Nonstandard abbreviations: MALDI-TOF MS, matrix-assisted desorption/ionization time-of-flight mass spectrometry; LC-MS/MS, liquid chromatographytandem MS; SELDI, surface-enhanced laser desorption/ionization; PSMA, prostate-specific membrane antigen; BPH, benign prostatic hyperplasia; LCM, laser capture microdissection; and EDRN, Early Detection Research Network. 
 |
References
|
|---|
- Chaurand P, DaGue BB, Pearsall RS, Threadgill DW, Caprioli RM. Profiling proteins from azoxymethane-induced colon tumors at the molecular level by matrix-assisted laser desorption/ionization mass spectrometry. Proteomics 2001;:1320-1326.
- . American Cancer Society. Cancer facts and figures 1996 American Cancer Society Atlanta. .
- Srinivas PR, Kramer BS, Srivastava S. Trends in biomarker research for cancer detection. Lancet Oncol 2001;2:698-704.[ISI][Medline]
[Order article via Infotrieve]
- Lander ES, Linton LM, Birren B, Nisbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature 2001;409:860-921.[Medline]
[Order article via Infotrieve]
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science 2001;291:1304-1351.[Abstract/Free Full Text]
- Van Eyk JE. Proteomics: unraveling the complexity of heart disease and striving to change cardiology. Curr Opin Mol Ther 2001;3:546-553.[ISI][Medline]
[Order article via Infotrieve]
- Liang P, Pardee AB. Recent advances in differential display. Curr Opin Immunol 1995;7:274-280.[ISI][Medline]
[Order article via Infotrieve]
- Zhang H, Zhang R, Liang P. Differential screening of differential display cDNA products by reverse Northern. Methods Mol Biol 1997;85:87-93.[Medline]
[Order article via Infotrieve]
- Datson NA, van der Perk-de Jong J, van den Berg M, de Kloet ER, Vreugdenhil E. MicroSAGE: a modified procedure for serial analysis of gene expression in limited amounts of tissue. Nucleic Acids Res 1999;27:1300-1307.[Abstract/Free Full Text]
- Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression. Science 1995;270:484-487.[Abstract/Free Full Text]
- Gygi S, Rochon Y, Franza BR, Aebersold R. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 1999;19:1720-1730.[Abstract/Free Full Text]
- Anderson L, Seilhamer J. A comparison of selected mRNA and protein abundances in human liver. Electrophoresis 1997;18:533-537.[ISI][Medline]
[Order article via Infotrieve]
- OFarrell PH. High resolution two-dimensional electrophoresis of proteins. J Biol Chem 1975;250:4007-4021.[Abstract/Free Full Text]
- Chait BT, Kent SB. Weighing naked proteins: practical, high-accuracy mass measurement of peptides and proteins. Science 1992;257:1885-1894.[Abstract/Free Full Text]
- Dongre AR, Eng JK, Yates JR. Emerging tandem-mass-spectrometry techniques for the rapid identification of proteins. Trends Biotechnol 1997;15:418-425.[ISI][Medline]
[Order article via Infotrieve]
- Gygi S, Han DK, Gingras AC, Sonenberg N, Aebersold R. Protein analysis by mass spectrometry and sequence database searching: tools for cancer research in the post-genomic era. Electrophoresis 1999;20:310-319.[ISI][Medline]
[Order article via Infotrieve]
- Neubauer G, King A, Rappsilber J, Calvio C, Watson M, Ajuh P, et al. Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex. Nat Genet 1998;20:46-50.[ISI][Medline]
[Order article via Infotrieve]
- Yates J, 3rd. Mass spectrometry and the age of the proteome. J Mass Spectrom 1998;33:1-19.[ISI][Medline]
[Order article via Infotrieve]
- Gygi SP, Rist B, Gerber SA, Turecek F, Gelb MH, Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat Biotechnol 1999;17:994-999.[ISI][Medline]
[Order article via Infotrieve]
- Edman P, Begg G. A protein sequenator. Eur J Biochem 1967;1:80-91.[ISI][Medline]
[Order article via Infotrieve]
- Totty NF, Waterfield MD, Hsuan JJ. Accelerated high-sensitivity microsequencing of proteins and peptides using a miniature reaction cartridge. Protein Sci 1992;1:1215-1224.[Abstract]
- Karas M, Hillenkamp F. Laser desorption ionization of proteins with molecular masses exceeding 10 000 daltons. Anal Chem 1998;60:2299-2301.
- Mann M, Hendrickson RC, Pandey A. Analysis of proteins and proteomes by mass spectrometry. Annu Rev Biochem 2001;70:437-473.[ISI][Medline]
[Order article via Infotrieve]
- Henzel WJ, Billeci TM, Stults JT, Wong SC, Grimley C, Watanabe C. Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc Natl Acad Sci U S A 1993;90:5011-5015.[Abstract/Free Full Text]
- Mann M, Wilm M. Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem 1994;66:4390-4399.[Medline]
[Order article via Infotrieve]
- Yates J R, Eng JK, McCormack AL. Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal Chem 1995;67:3202-3210.[Medline]
[Order article via Infotrieve]
- Hancock WS, Wu S-L, Sheih P. The challenge of developing a sound proteomics strategy. Proteomics 2002;2:352-359.[ISI][Medline]
[Order article via Infotrieve]
- Peng J, Gygi SP. Proteomics: the move to mixtures. J Mass Spectrom 2001;36:1083-1091.[ISI][Medline]
[Order article via Infotrieve]
- Liebler DC. Proteomic approaches to characterize protein modifications: new tools to study the effects of environmental exposures. Environ Health Perspect 2002;110:3-9.
- Lai CC, Tsai CH, Tsai FJ, Lee CC, Lin WD. Rapid monitoring of congenital adrenal hyperplasia with microbore high-performance liquid chromatography/electrospray ionization tandem mass spectrometry from dried blood spots. Rapid Commun Mass Spectrom 2001;15:2145-2151.[ISI][Medline]
[Order article via Infotrieve]
- Caprioli RM, Farmer TB, Gile J. Molecular imaging of biochemical samples: localization of peptides and proteins using MALDI-TOF MS. Anal Chem 1997;69:4751-4760.[Medline]
[Order article via Infotrieve]
- Stoeckli M, Chaurand P, Hallahan DE, Caprioli RM. Imaging mass spectrometry: a new technology for the analysis of protein expression in mammalian tissues. Nat Med 2001;7:493-496.[ISI][Medline]
[Order article via Infotrieve]
- Stoeckli M, Farmer TB, Caprioli RM. Automated mass spectrometry imaging with a matrix-assisted laser desorption ionization time-of-flight instrument. J Am Assoc Mass Spectrom 1999;10:67-71.
- Hoffmann P, Hong J, Moritz RL, Connoly LM, Frecklington DF, Layton MJ, et al. Continuous free-flow electrophoresis separation of cytosolic proteins from the human colon carcinoma cell line LIM 1215: a non two-dimensional gel electrophoresis-based proteome analysi