|
|
||||||||
Proteomics and Protein Markers |
1 Barnett Institute, Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA.
2 Department of Respiratory Medicine and Allergology, University Hospital of Lund, Lund, Sweden.
3 AstraZeneca, Department of Biological Sciences, Lund, Sweden.
4 Department of Respiratory Medicine and Allergology, Sahlgrenska University Hospital, Gothenburg, Sweden.
aAddress correspondence to this author at: AstraZeneca Research and Development, Scheelev. 8, Lund 22 187, Sweden. Fax 46-33-71-44; e-mail gyorgy.marko-varga{at}astrazeneca.com.
| Abstract |
|---|
|
|
|---|
Methods: BAL fluid samples obtained by bronchoscopy from 60-year-old healthy never-smokers (n = 18) and asymptomatic smokers (n = 30) were analyzed in either pooled or individual form. Initial global proteomic analysis used shotgun digestion approaches on unfractionated BAL fluid samples (after minimal sample preparation) and separation of peptides by gradient (90-min) liquid chromatography (LC) coupled with on-line linear ion trap quadropole mass spectrometry (LTQ MS) for identification and analysis.
Results: LTQ MS identified 481 high- to low-abundance proteins. Relative differences in patterns of BAL fluid proteins in smokers compared with never-smokers were observed in pooled and individual samples as well as by 2-dimensional gel analysis. Gene ontology categorization of all annotated proteins showed a wide spectrum of molecular functions and biological processes.
Conclusions: The described method provides comprehensive qualitative proteomic analysis of BAL fluid protein expression from never-smokers and from smokers at risk of developing chronic obstructive pulmonary disease. Many of the proteins identified had not been detected in previous studies of BAL fluid; thus, the use of LC-tandem MS with LTQ may provide new information regarding potentially important patterns of protein expression associated with lifelong smoking.
| Introduction |
|---|
|
|
|---|
To explore the context of global protein expression patterns in complex clinical samples, one frequently used strategy involves an approach known as "shotgun sequencing", which allows the rapid identification of hundreds of protein identities present in high to medium abundance (5)(6)(7). The relative sensitivity for accurate identification of the thousands of peptides in these separations has recently improved with the development of new paradigms in mass spectrometry (MS) 1 design. Thus, commercial instruments such as the linear ion trap quadropole (LTQ) mass spectrometer and the LTQ orbitrap hybrid mass spectrometer have contributed to greater sensitivity and accuracy in the detection of peptide masses in separated peaks.
Bronchoalveolar lavage (BAL) is a clinical biofluid sampling of the soluble protein contents of the airway lumen. The lavage procedure is often used to evaluate conditions of upper airway inflammation, allergic airway disease, or respiratory tract malignancies. We and others have reported on the relative abundances of proteins present in BAL fluid, based on 2-dimensional gel separation and on the identification of excised spots by MS, in clinical diseases such as asthma, sarcoidosis, idiopathic fibrosis, and interstitial lung disease (8)(9)(10)(11)(12). In a recent study that compared the BAL protein profiles obtained by 2-dimensional gel analysis in lifelong smokers and never-smokers, we identified common qualitative and quantitative differences among individuals grouped by smoking histories (13)(14). These included proteins with potentially important biological activities related to inflammation, oxidation-reduction, tissue matrix turnover, and immunity.
Although the 2-dimensional gel electrophoresis system described above can consistently identify proteins present in medium to high abundance in BAL samples, we were interested in determining whether the shotgun sequencing technology might achieve additional protein identification sensitivity. In the present study, we used the shotgun approach coupled with 1-dimensional reversed-phase nano-HPLC coupled on-line with LTQ MS (15)(16)(17)(18) to analyze the protein composition in BAL fluid. Using pooled aliquots of BAL samples from lifelong or never-smokers, we investigated whether there are obvious differences in protein profiles in each group. To validate these findings, we analyzed individual BAL samples from each of the study participants in the same system to identify and quantify relative differences in expression profiles associated with smoking history. Finally, individual differences in protein expression seen by LTQ analysis were confirmed by 2-dimensional gel analysis of similar aliquots from the same participants.
| Material and Methods |
|---|
|
|
|---|
|
sampling
BAL sampling was carried out under standard conditions (20) with the informed consent of the individuals, and the studies were approved by the ethics committees of the Sahlgrenska Hospital (Gbg M 117-01) and Lund University Hospital in Sweden (LU 689-01). After sampling, the BAL fluid was immediately transported on ice to the laboratory for processing and storage at 80 °C (20). The protein concentrations in the recovered BAL fluids were determined by the Coomassie Plus protein assay with bovine serum albumin as the calibrator (21). The total protein concentrations in the recovered BAL samples were not significantly different among individuals, but the recovery of BAL fluid was lower in the smoking cohort (Table 1
).
sample preparation
Pooled samples were prepared by combining equal amounts of protein from individual samples and were adjusted to a final quantity of 50 µg. The proteins were digested with trypsin, according to a previously described procedure (22).
ion trap tandem ms analysis approach and hplc separations
The peptides were separated on a C18 nano-capillary column [Magic C18; 150 x 0.075 mm (i.d.); packed in house] on an Ettan multidimensional liquid chromatography (LC) system. The flow rate was maintained at 400 nL/min. The gradient was started at 2% acetonitrile containing 1 mL/L formic acid in water, increased to 35% acetonitrile in 60 min, then increased to 60% acetonitrile in 15 min, and finally, increased to 90% acetonitrile in 10 min. Of each sample, 2 µg was injected on the column by the auto sampler. The resolved peptides were detected on an LTQ mass spectrometer (Thermoelectron) with a nano-electrospray ionization ion source. To provide consistency, as proposed by Washburn et al. (23), each pooled sample was analyzed in triplicate.
protein identification and semiquantification
We applied as closely as possible the proposed publication guidelines for the analysis and documentation of peptide and protein identifications (24)(25). The peptide sequences generated by LTQ MS were identified by correlation with the peptide sequences present in the nonredundant National Center for Biotechnology Information protein database (TaxonomyID=9606, available at www.ncbi.nlm.nih.gov) (26), which contains Swiss-Prot protein entries, using the Sequest algorithm that is incorporated in the BioWorksTM software (Ver. 3.1 SR; Thermoelectron).
To estimate relative protein abundances, we considered the number of peptides leading to identification and the semiquantitative Sequest score parameter in conjunction with peak-area measurements. Peak-area measurements were performed on abundant peptides. We extracted the peak area of the m/z signal of a selected peptide for a given protein within 0.5 min of a given retention time, from where the peptide was identified in the LC-MS chromatogram. The peak in the extracted ion chromatogram was identified when the signal-to-noise ratio was >3. The cross-correlation (Xcorr) values for each peptide were inspected, and if an individual value showed significant deviation, the spectrum obtained by tandem MS (MS/MS) was inspected manually.
| Results |
|---|
|
|
|---|
bal protein profiling in never-smokers and smokers
A direct comparison of the chromatographic profiles of pooled BAL fluids from never-smokers, light smokers, and heavy smokers, as separated by capillary LC, is shown in Fig. 1
. The pattern for light smokers more closely resembled the pattern for never-smokers than the pattern for heavy smokers. These differences were particularly apparent in the peak distributions in the 40- to 60-min window (nonpolar separation).
|
When we used the requirement of
2 peptides per identification, the 90-min LC/LTQ platform (0.4 s/scan) identified 268 proteins in the pooled BAL samples from never-smokers, 309 proteins in the pooled samples from light smokers, and 314 proteins in the pooled samples from heavy smokers. Approximately one third (n = 130) of all proteins identified were identified in all 3 groups. However, we also observed that a substantial number of proteins were identified in either the samples from smokers (n = 137) or never-smokers (n = 63). These groups of unique proteins included both high-abundance and medium-abundance proteins. The majority of these proteins have not been reported previously in BAL fluid. The 5 most abundant proteins corresponded to generally recognized major components of BAL fluid: albumin, transferrin,
1-antitrypsin, IgA, and IgG. In the case of albumin, the protein was identified with 83% sequence coverage, whereas for transferrin and
1-antitrypsin, the proteins were identified with 69% and 56% sequence coverage, respectively. These proteins were present in samples from all 3 groups. IgA and IgG were identified by their heavy and light chains, where each individual chain had a sequence coverage between 30% and 90%. Typical RSD values for the high-abundance proteins albumin, transferrin, and IgG, based on triplicate runs, were 6.7%, 15%, and 26% for the never-smokers and 3.1%, 17%, and 22% for the heavy smokers.
differential expressions of bal proteins in study participants
To evaluate relative changes in protein expression corresponding to chronic exposure to cigarette smoke, we compared protein concentrations in never-smokers and in chronic smokers. The comparison included the Sequest score as well as differences in total peptide ion intensity of a selected peptide fragment.
As an example of quantitative regulation, the Sequest score and the number of peptide sequencing events showed an up-regulation of UMP-CMP kinase among the heavy smokers. This up-regulation was confirmed by comparison of peak areas of the peptide KNPDSQYGELIEK (m/z 760.8) for UMP-CMP kinase, as presented in Fig. 2 of the online Data Supplement. It represents a 13-fold up-regulation of UMP-CMP kinase among the heavy smokers. Peak-area measurements from triplicate analyses gave an RSD of 12%. The peak areas of selected peptides used for preliminary quantification of the proteins are shown in Table 2
. As an example of significant regulations in terms of presenceabsence, the Sequest score and the number of peptide sequencing events showed up-regulation of cathepsin D among the smokers and of glutathione S-transferase A2 among the heavy smokers. Cathepsin D was below the detection limit in samples from the never-smokers, and glutathione S-transferase A2 was below the detection limit in samples from the light and never-smokers. These cases of presenceabsence were confirmed by comparison of peak areas of the peptide LLDIACWIHHK (m/z 703.8) for cathepsin D and the peptide NDGYLMFQQVPMVEIDGMK (m/z 855.3) for glutathione S-transferase A2.
|
An example of the identification of a doubly charged peptide obtained by MS/MS is shown in Fig. 2
. In this example, the peptide sequence KAYINTISSLKDLITK of the precursor ion (m/z 905.3) of A-kinase anchor protein 9 was identified.
|
validation of the lc-ms/ms platform with individual samples
We were interested in determining how well the consensus protein profiles obtained with pooled samples compared with the possible ranges of expression present in individual samples. To determine the relative abundances and presence rates of the BAL proteins identified in the pooled samples, we ran 2 µg of BAL sample from each of the 48 study individuals separately on the LC-MS/MS platform. A comparison of the number of protein identities found in pooled or individual samples in never-smokers and heavy smokers is shown in Fig. 3
. We observed a wide variation in the total numbers of proteins that could be identified in a given sample (range, 48314), irrespective of smoking history. The pooled samples achieved higher numbers of protein annotations than any of the individual samples. This is likely the result of an additive effect from several of the group members because of the lack of obvious outliers and the tight clustering seen in the distributions of the number of identities detected in both groups.
|
In general, the patterns of common or unique protein identities observed in each of the group pools were also maintained by the individuals within the group. Analysis of individual samples allowed us to notate the relative presence rates of the individual identifications in the different samples. As an example, Table 3
shows representative examples of BAL proteins commonly identified in samples from both smokers and never-smokers as well as proteins found only in samples from smokers. We compared the number of peptides identified in the pooled samples with the range of peptides found in individuals and found close agreement. We found no evidence that the pooled protein profiles were skewed because of the contribution of singular individuals who contributed a dominant selection of proteins to the pool. Altogether, these data indicate that the profiles for the pooled samples generally mirrored the entire group of individuals.
|
validation of the lc-ms/ms results by 2-dimensional gel analysis of individual bal samples
In an ongoing activity, performed in parallel to the LC-MS/MS, we analyzed the patterns of protein expression in BAL fluids in individuals, using 2-dimensional gel separation coupled with computer-assisted image analysis for spot localization and annotation and matrix-assisted laser desorption/ionization (MALDI)-MS for isolated spot protein identification (13). Using separate aliquots of the same sample, we were thus able to directly compare the relative presence and/or absence and the pixel intensity values of individual protein spots that corresponded to the protein identities obtained with the LC-MS/MS platform. We selected a set of high- to medium-abundance proteins on the gels and compared the relative spots in terms of their distributions on the individual gels, total pixel density of the spot areas, and the identity scores obtained with the Mascot sequence annotation software. Fig. 4
presents representative examples of the 2-dimensional electrophoresis spot patterns for proteins that had previously been identified in the pooled BAL samples by the LC-MS/MS platform. Shown are representative gel profiles of never-smokers or smokers. The Rho-GDP dissociation inhibitor protein, which was identified only in the smokers and not in the never-smokers by LC, showed a strong spot in the smokers and only a weak spot in the never-smokers (near the limit of detection). We found a similar concordant result for cathepsin D, which showed consistency in the relative detectable concentrations of protein on the 2 platforms. The last example shown,
1-antitrypsin, showed a high relative abundance on the gels of both smokers and never-smokers. Together, the results of the direct comparison of the protein expression profiles of individuals obtained by both LC-MS/MS and 2-dimensional gel electrophoresis showed remarkable similarity to the relative distributions of proteins detected in the pooled samples.
|
gene ontology categorization of bal proteins
The entire dataset of proteins identified in the pooled BAL samples was annotated with Gene Ontology (27)(28). Because proteins are commonly assigned to more than a single molecular function and biological process, we have taken this into consideration in Table 1 in the online Data Supplement. Accordingly, Fig. 3 in the online Data Supplement shows bar diagrams of annotated proteins identified by at least 2 peptides in the BAL fluid of the pooled samples. For graphical reporting, the proteins were categorized into molecular function and biological process.
| Discussion |
|---|
|
|
|---|
BAL fluid contains a wide variety of proteins that are either locally released by epithelial and inflammatory cells or through plasma exudation. Because of the diverse origin of BAL proteins, analysis of BAL fluid may reveal important pathologic mediators and may enable more accurate characterization of many lung diseases at the molecular level. As shown by the Gene Ontology chart (Fig. 3 in the online Data Supplement), there is a broad distribution of protein functional classes associated with different biological processes such as cellular physiologic processes, metabolism, response to stimulus, cell communication, localization, and organ development. This distribution classification is very different from what had been determined previously for plasma (29). One consideration to address from the BAL studies is that the group labeled "unknown" was found to be fairly large for both classifications. The reason for this is attributable to the large number of proteins observed in this study that are not annotated in Gene Ontology.
Pooling of the samples facilitated the initial identification of differentially regulated proteins in the never-smokers and smokers. Pooling also reduced the experimental variations in the data and minimized the data files subjected to computer-intensive comparative analysis. To validate the dataset obtained with pooled samples, we subjected samples from all of the study participants to individual separations/analyses. Comparison of the results from the pooled and individual samples within each of the study groups showed remarkable similarities in the overall protein annotation indices of each respective group.
To evaluate relative changes in protein expression in response to chronic exposure to cigarette smoke, we used a semiquantitative approach. This included the Sequest score parameter and the number of peptide sequencing events, along with total peptide ion-intensity measurements of the selected peptide fragments. Quantification of proteins based directly on MS signal intensities without internal standards has historically drawn little attention. However, reports from several groups who used linear ion traps have shown that relative changes in total signal intensities of peptides correlated well with their concentration changes in one sample vs another (15)(16)(17)(18). We found that individual differences in protein expression seen by LTQ analysis could be confirmed by 2-dimensional gel analysis of similar aliquots from the same individuals by protein spot intensity.
Altogether, our results showed that the LTQ platform is rapid (90-min run) and reproducible and that it can identify a high number of proteins and determine relative differences in global protein profiles from minimal starting sample volumes and protein concentrations; only 2 µg of BAL proteins was needed for the analysis. Additionally, the separation did not require prefractionation to reduce the complexity of the BAL sample. However, limitations of the shotgun approach are that it detects only protein fragments and not intact proteins and therefore cannot discern isomeric forms of proteins and posttranslational modifications. When applying shotgun sequencing under semiquantitative conditions, one should use caution when comparing low-abundance proteins and proteins with small relative changes. For example, AKAP9 and RAB26 were observed among the never-smokers by a score of 40.1, with a 4-peptide hit, but they were not observed in the other groups. However, for the 3 groups, this situation showed no significant change in relative peak-area measurements. These results are consistent with other studies of low-abundance proteins that have shown that such proteins are not easily sequenced by MS/MS because of the time-dependent nature of the measurement (30). Another limitation is that the semiquantification is based on the number of peptides (with only some overlap between samples) and is therefore inclined to misrepresent absolute protein concentrations.
We believe that 2-dimensional gel and liquid-phase separations interfaced with MS are complementary approaches with different application in various biological settings, as well as being useful in combination with each other.
The reproducibility of the shotgun sequencing platform is governed by the peptide separation conditions (e.g., column length and diameter, packing, flow, and injection volume) and the subsequent MS detection. RSD values between peptide separations were typically 0.2%2.5% for polar, medium-polar, and nonpolar protein sequences. Criteria used for protein MS identifications were the cross-correlation, which relates to database fit; scoring; and reproducible identification of the protein in 2 or more samples by the presence of 2 or more peptides in triplicate analyses. Biological variation among patients is also inherent in these assays and is a component of the reproducibility of the final results because of the large pulmonary surface area sampled by the BAL procedure and by the sample preparation. We minimized group variation by studying BAL samples from an age- and sex-matched cohort. The utility of studying BAL fluid is clinically important, and greater knowledge concerning its components holds potential value for measuring health and disease.
In conclusion, to our knowledge, we report here the most comprehensive database of the proteins present in BAL fluid from lifelong smokers and from never-smokers, using the approach known as shotgun sequencing.
| Acknowledgments |
|---|
| Footnotes |
|---|
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
C. L. Ventura, R. Higdon, L. Hohmann, D. Martin, E. Kolker, H. D. Liggitt, S. J. Skerrett, and C. E. Rubens Staphylococcus aureus Elicits Marked Alterations in the Airway Proteome during Early Pneumonia Infect. Immun., December 1, 2008; 76(12): 5862 - 5872. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-J. Li, M. Peng, H. Li, B.-S. Liu, C. Wang, J.-R. Wu, Y.-X. Li, and R. Zeng Sys-BodyFluid: a systematical database for human body fluid proteome research Nucleic Acids Res., October 31, 2008; (2008) gkn849v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Plymoth, C.-G. Lofdahl, A. Ekberg-Jansson, M. Dahlback, P. Broberg, M. Foster, T. E. Fehniger, and G. Marko-Varga Protein Expression Patterns Associated with Progression of Chronic Obstructive Pulmonary Disease in Bronchoalveolar Lavage of Smokers Clin. Chem., April 1, 2007; 53(4): 636 - 644. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Chen and D. R. Moller Expression Profiling in Granulomatous Lung Disease Proceedings of the ATS, January 1, 2007; 4(1): 101 - 107. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |