|
|
||||||||
Proteomics and Protein Markers |
1 Center for Laboratory Diagnosis, Beijing Tiantan Hospital and Capital University of Medical Sciences, Beijing, China.
2 Ciphergen Biosystems, Inc., Beijing, China.
3 Deyi Diagnosis Institute, Beijing, China.
4 Taizhou Municipal Hospital, Taizhou, Zhejiang Province, China.
5 Institute of Respiratory Medicine and 6
Basic Medical Research Center, Chaoyang Hospital and Capital University of Medical Science, Beijing, China.
7 Department of Cell Biology, National Institute for the Control of Pharmaceutical and Biological Products (NICPBP), Beijing, China.
8 Institute of Virology, Chinese Academy of Preventive Medicine, Beijing, China.
9 Department of Quality Control, Beijing Red Cross Blood Center, Beijing, China.
10 Society of Blood Transfusion, Beijing, China.
11 National Engineering Research Center for Beijing Biochip Technology, Tsinghua University, Beijing, China.
12 Beijing Center for Disease Control and Prevention, Beijing Bureau of Public Health, Beijing, China.
13 Department of Neurosurgery, The Affiliated Hospital of Xuzhou Medical College, Jiangsu Province, China.
14 Center for Molecular Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, China.
15 Consulting Center of Biomedical Statistics, Academy of Military Medical Sciences, Beijing, China.
aAddress correspondence to this author at: Center for Molecular Immunology, Chinese Academy of Sciences, 13 Zhongguancun Bei Yi Tiao, PO Box 2714, Beijing, China 100080. Fax 86-10-62638849; e-mail hongtang{at}sun.im.ac.cn.
| Abstract |
|---|
|
|
|---|
Methods: We developed a mass spectrometric decision tree classification algorithm using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry. Serum samples were grouped into acute SARS (n = 74; <7 days after onset of fever) and non-SARS [n = 1067; fever and influenza A (n = 203), pneumonia (n = 176); lung cancer (n = 29); and healthy controls (n = 659)] cohorts. Diluted samples were applied to WCX-2 ProteinChip arrays (Ciphergen), and the bound proteins were assessed on a ProteinChip Reader (Model PBS II). Bioinformatic calculations were performed with Biomarker Wizard software 3.1.1 (Ciphergen).
Results: The discriminatory classifier with a panel of four biomarkers determined in the training set could precisely detect 36 of 37 (sensitivity, 97.3%) acute SARS and 987 of 993 (specificity, 99.4%) non-SARS samples. More importantly, this classifier accurately distinguished acute SARS from fever and influenza with 100% specificity (187 of 187).
Conclusions: This method is suitable for preliminary assessment of SARS and could potentially serve as a useful tool for early diagnosis.
| Introduction |
|---|
|
|
|---|
Despite such advances in virologic studies, early diagnosis of SARS has been based primarily on the clinical definitions released by WHO and CDC (14)(15), which can be confusing or contradictory (16). Available serologic tests cannot guarantee an early diagnosis (17), and PCR-based molecular detection of the viral RNA suffers from unsatisfactory sensitivity and specificity (3)(17)(18)(19). In the last year, failure to develop diagnostic tests for SARS, especially in the acute phase, severely impacted specific prevention and treatment measures for SARS. There is a need to establish a reliable diagnostic methodology for SARS-CoV, in particular, to distinguish the similar clinical manifestations of SARS and other respiratory tract infections. This urgency is reinforced by the first SARS case not linked to laboratory contamination, which occurred in Guangdong, China this year (20).
Proteomic analysis has provided a unique tool for the identification of diagnostic biomarkers, evaluation of disease progression, and drug development (21)(22). Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) enables rapid, reproducible protein/peptide profiling of multiple disease-specific biomarkers directly from crude samples (e.g., tissue cell lysates or body fluids) (23)(24). Small amounts of sample can be applied directly to a biochip coated with specific chemical matrices (e.g., hydrophobic, cationic, or anionic) or specific biochemical materials such as DNA fragments or purified proteins. The bound proteins/peptides can then be analyzed by MS to obtain the protein fingerprints, or even amino acid sequence determinants, when interfaced to a mass spectrometric microsequencing device.
Analogous to the proteomic detection of various cancers (25)(26), we used a weakly cationic ProteinChip (WCX2 chip surface) to retrospectively analyze SARS sera to determine whether there are distinct and reproducible protein fingerprints potentially applicable to the diagnosis of SARS. We established a decision tree algorithm consisting of four unique biomarkers for acute SARS in the training set and subsequently validated the accuracy of this classifier by use of a completely blinded test set.
| Materials and Methods |
|---|
|
|
|---|
|
|
The patients and serum samples were then divided into two groups: one for the "training" set and the other for the blinded "test" set (Tables 1
and 2
). SARS and non-SARS control sera were all stored at 80 °C in 30-µL aliquots. Before each round of mass spectrometric assays, we routinely performed quality control of serum samples by the appearance and peak intensity of m/z 6635.09 (Fig. 3A
). Because the peak intensity of m/z 6635.09 remained relatively constant among spectra from different assays and different instruments, it was also used for normalization between each round of analyses.
|
proteomic analysis
Three different chip chemistries (hydrophobic, anionic, and cationic) were first evaluated to determine which affinity chemistry gave the best serum profiles in terms of the number and resolution of proteins. The weakly cationic exchange chip (WCX) gave the best results with mass spectra from 0 to 200 kDa. The WCX chips in an 8-well bioprocessor format (Ciphergen) were chosen to allow a larger volume of serum for the chip array. The bioprocessor was pretreated with 150 µL of 100 mmol/L sodium acetate (pH 4) on a platform shaker at 250 rpm for 5 min. The excess sodium acetate was removed by inverting the bioprocessor on a paper towel. This process was repeated twice. The serum samples were thawed on ice in a Biosafety Level II cabinet, and 20 µL of each sample was mixed with 30 µL of U9 buffer (9 mol/L urea, 10 g/L CHAPS in phosphate-buffered saline) in a 1.5-mL Eppendorf tube and vortex-mixed at 4 °C for 20 min. We then added 100 µL of U1 buffer [U9 buffer diluted by ninefold (100 mL of U9 buffer plus 800 mL of Tris-HCl) with 50 mmol/L Tris-HCl (pH 7)] to the serum/urea mixture, vortex-mixed it for 10 min, and stopped the reaction by addition of 600 µL of sodium acetate on ice. We applied 50 µL of the serum/urea sample to each well, and the bioprocessor was sealed and shaken on a platform shaker at 250 rpm for 30 min. The excess serum/urea solution was discarded, and the bioprocessor was washed three times with 100 mmol/L sodium acetate as described above. The chips were removed from the bioprocessor, washed twice with deionized water, and air-dried. Subsequently 0.5 µL of EAM sinapinic acid saturated in 500 mL/L acetonitrile5 g/L trifluoroacetic acid was added to each well. After air-drying, the sinapinic acid application was repeated.
Chips were then placed in the Protein Biological System II (PBS II) mass spectrometer reader (Ciphergen), and TOF spectra were generated by an average of 104 laser shots collected in the positive mode. The settings for low-energy readings were set with a high mass of 50 kDa and were optimized from 3 to 15 kDa at a laser intensity of 200, detector sensitivity of 8, and a focus by optimization center. High-energy readings were set with a high mass of 200 kDa and were optimized from 10 to 50 kDa at a laser intensity of 230 and a detector sensitivity of 9. Mass accuracy was calibrated externally by use of the All-in-One peptide molecular mass calibrator (Ciphergen).
Sera from a healthy control were individually applied to seven bait surfaces of eight WCX2 chips and run during 3-day intervals for analysis of within-run reproducibility. In parallel, 40 samples (10 from SARS patients, 10 from patients with fever, 10 from patients with pneumonia, and 10 from health controls) were applied in duplicate to a single chip and run on two different instruments (PBS II and PBS IIc; Ciphergen) for between-run analysis of instrument drift. To avoid the possibility that placement or run order of samples would affect assay accuracy, samples were loaded on chips in a rotational fashion. In brief, sample 1 was spotted on the 8-well directional chip (wells A to H) in duplicate in wells A and B and then in wells G and H of the second chip. Samples 2, 3, and 4 were loaded on chips in the same rotation order. We also randomized the order of chip placement in the spectrometer to minimize bias from run order. Spectra were collected for each sample and analyzed independently using the classification algorithm established in the training step.
The peak at m/z 6635.09 in the quality-control serum was adjusted to have an intensity of 4060 for both the PBS II and PBS IIc. The peak intensity of m/z 6635.09 in the quality-control serum was used to normalize instrument resolution between the PBS II and PBS IIc. We normalized spectra using total ion current with an identical normalization coefficient and a low mass cutoff <2000 Da. If the factor was <0.3 or >2.9 after normalization to total ion current for the peak at m/z 3939, repeated runs would be performed. No outlier was rejected in the test. The "root" biomarker, m/z 3939, yielded the lowest and similar P value in both the PBS II and PBS IIc.
bioinformatics and biostatistics
Peak detection was performed with Biomarker Wizard software 3.1.1 (Ciphergen). The m/z ratios between 2000 and 20 000 were selected for analysis because this range contained the majority of the resolved protein and peptides. The m/z range between 0 and 2000 was eliminated from analysis to avoid interference from adducts, artifacts of the energy-absorbing molecules, and other possible chemical contaminants. Peak detection involved baseline subtraction, mass normalization using a common calibrant peak (m/z 6635.09), and normalization to the total ion current intensity with a minimum m/z of 2000, using an external normalization coefficient of 0.2 (normalization factor for individual spectrum = 0.2/average ion current for each spectrum) for spectra obtained at different times or locations. The settings used for autodetect peaks to cluster in the first pass were a signal-to-noise ratio of 5 and a minimum peak threshold of 5% of all spectra. The peak clusters were completed by second-pass peak detection using a signal-to-noise ratio of 2 and 0.3% of mass for the cluster window. An average of 99 peaks was detected in each spectrum. The mass range from 20 to 200 kDa was analyzed in parallel.
analytical procedure
Data analysis.
The data analysis process used in this study involved three stages: (a) peak detection and alignment; (b) selection of peaks with the highest discriminatory power; and (c) data analysis using a decision tree algorithm. A random sampling (acute SARS, fever, pneumonia, lung cancer, and healthy) with two strata (acute SARS and non-SARS) was used to separate the entire data set into training and test data sets. The training data set consisted of SELDI spectra from 37 acute SARS and 74 non-SARS serum samples. The validity and accuracy of the classification algorithm were then challenged with a blinded test data set consisting of 37 acute SARS and 993 non-SARS samples.
Decision tree classification.
Construction of the decision tree classification algorithm was performed as described previously (26) with modifications based on the Biomarker Patterns Software (Ciphergen). Classification trees were split into two branches or nodes, using one rule at a time. We set target the variable level at 2 and the minimum value at 0, and the decision was made based on the presence or absence and the intensity of one peak, using the Gini or Twoing method, favoring even splits from 0.00 to 2.00 and varied by 0.2 each time, and with V-fold cross-validation from 6 to 12 changed by 2 for the growth of 88 trees. The lowest cost tree (value = 0.068; Gini = 2.0; V-fold = 10) was selected for the final test.
| Results |
|---|
|
|
|---|
|
|
|
The above classifier used only those masses in the low-energy readings (m/z <50 000). To exhaust all meaningful serum biomarkers, we expanded the analysis of the same training samples in the high-energy setting (m/z <200 kDa, see Materials and Methods) and pooled both low- and high-energy readings together [161 x (37 + 74) = 17 871 peaks]. The classification algorithm then used five peaks between 4 and 16 kDa (m/z 4824.28, 8136.64, 11505.30, 14 023.00, and 15 369.20; peaks at m/z 8136.64 and 11 505.30 overlapped with those in Fig. 1
) in six terminal nodes and yielded a sensitivity and specificity of 94.6% (35 of 37) and 95.9% (71 of 74), respectively (data not shown). The peaks at m/z 3939.08 and 4137.71 in this new classifier disappeared because their corresponding peak intensities were beyond the limits after normalization with the intensity for the peak at m/z 6635.09 (see the section on patients and samples in the Materials and Methods). However, because most of the SARS cases in this alternative classifier (34 of 37) fell into the terminal node where the proteins/peptides were down-regulated (m/z 14023.0
0.611087, m/z 4824.28
0.746989, and m/z 15369.2
3.27656), and because this algorithm had to combine two energy settings for analysis, we reasoned that the decision tree generated with only low-energy readings (Fig. 1
) would be more sensitive (100%) and more convenient for a clinical application.
To determine the reproducibility of SELDI spectra, mass location, and intensity from array to array on a single chip (intraassay) and between instruments (interassay), we first spotted the serum from a healthy control on seven baits in a single chip and collected seven independent spectra over a time span of 21 days (Fig. 3A
). We then selected seven proteins in the range of 310 kDa (m/z 4089.59, 5334.17, 5631.18, 5901.49, 6625.63, 7762.24, and 7966.63; black arrows in Fig. 3A
) to calculate the intraassay CV. These peaks were selected because they were in the proximity of the four biomarkers with comparable current intensities. The interassay experiments were similar except that sera from healthy controls and from patients with high fever, pneumonia, and SARS were applied to a single chip, and the independent spectra were collected from two different instruments (PBS II and PBS IIc; Fig. 3, B and C
). The mean intra- and interassay CVs for peak location were 0.02% and 0.03%, respectively. We considered masses with accuracies within 0.1% between spectra to be the same. The mean intra- and interassay CVs for the normalized intensity were 15% and 20%, respectively. CV calculations using lower intensity peaks (Fig. 3A
, gray arrowheads), on the other hand, yielded results similar to those obtained with the seven high-intensity peaks (peak location, intra- and interassay CVs both 0.03%; peak intensity, intraassay CV = 17% and interassay CV = 18%).
detection of sars
Analysis of spectra from the completely blinded test set (37 acute SARS and 993 controls; Tables 1
and 2
) accurately classified 36 of 37 (97.3%) SARS specimens and accurately classified 987 of 993 (99.4%) of the controls as non-SARS (Table 3
). More important was that the classification algorithm successfully distinguished acute SARS from fever and influenza, with a sensitivity and specificity reaching 97.3% (36 of 37) and 100% (187 of 187; 60 of 60 with influenza), respectively. Interestingly, when we tested the classifier using an additional control population of 40 samples from patients in the Beijing area with measles after July 16, 2003, who had no history of close contact with SARS patients and had not visited those hospitals treating SARS patients, the classifier had a specificity of 100% (95% confidence interval, 89100%; data not shown).
| Discussion |
|---|
|
|
|---|
The identification of proteins/peptides of pathophysiologic significance (phenomic fingerprints) in crude biological and clinical samples by SELDI-TOF MS has been demonstrated in various cancer studies (28). Using a similar profiling strategy, we have established a classification algorithm that delineates probable SARS patients as early as day 1 after self-described onset of symptoms from healthy individuals and from patients with respiratory tract infections in the training set (sensitivity = 100%; specificity = 97.3%). When applied to the blinded test set, this discriminatory profiling method precisely classified 97.3% of patients with acute SARS and 99.4% of non-SARS patients. More strikingly, our classifier was able to discriminate SARS-CoV infection from bacterial (mycoplasma, tuberculosis) and other local (influenza) or systemic (measles) viral infections of the respiratory tract with a specificity reaching 100%. This was attributable to the inclusion of corresponding inflammatory control samples in the training set and optimization of the classification algorithm. The biomarkers identified in the acute phase of SARS seemed to remain throughout the convalescent phase of the disease because when we applied the identical tree classification to samples from patients in whom onset of fever had been >2, 3, 4, and >5 weeks previously, we could detect SARS with sensitivities and specificities reaching 89.2% and 91.8%, 86.0% and 91.8%, 93.1% and 91.8%, and 79.5% and 91.8%, respectively (data not shown). One intriguing observation was that SARS patients clustering in terminal node 3 all demonstrated moderate clinical features, whereas those in node 5 were severe cases. We are investigating the correlation between this proteomic pattern and the pathology of SARS.
These results represent, to the best of our knowledge, the most accurate laboratory technique for early detection of SARS: PCR-based assays have a maximum sensitivity of 80% when used to test nasopharyngeal aspirates or plasma specimens (29)(30). The proteomic method described here also has advantages over PCR-based assays in that it does not require BSL-3 containment and it can detect SARS in serum samples. This is a critical alternative to PCR-based tests, which are challenged by low viral loads in nasopharyngeal aspirates and throat swab specimens in the acute phase of SARS.
Instead of traditional chromatographic fractionation of samples, we directly spotted the crude serum on the WCX chips. By doing this we avoided the unnecessarily biased depletion of thousands of proteins and/or peptides associated with human serum albumin before MS analysis. Processing of samples and generation of the diagnostic mass spectra by our method required only a small amount of serum (20 µL vs several milliliters needed for PCR methods) and took <3 h. High-throughput proteomic screening for SARS in a 96-well format is also feasible.
We adhered to the WHO case definition and eligibility criteria for SARS and avoided using samples from non-SARS controls from hospitals where SARS patients had been admitted because these persons might have a history of close contact with SARS patients or had been inside those SARS hospitals. We further emphasized this point by sampling control sera from a nonepidemic region of the country. Although the possibility might exist that the difference in serum fingerprints would reflect differences among SARS and non-SARS hospitals, the fact that all SARS cases from 38 different hospitals fit into the single classification algorithm would likely rule out such a concern. More importantly, severe and mild cases of SARS from different hospitals, which had been completely randomized in the experimental analysis, fell into distinct nodes of the tree classification, strongly indicating that the biomarkers we have identified were specific to SARS and not the sites at which blood samples were collected. We further minimized the potential sampling bias by simultaneously using four biomarkers instead of one (e.g., m/z 3939.08), which nevertheless could sufficiently delineate SARS from non-SARS (sensitivity = 93.7%; specificity = 91.8%; data not shown). All SARS and non-SARS samples were from patients with the same ethnic background. SARS and non-SARS control sera collected at different times were all freshly aliquoted and properly stored at 80 °C.
The differential protein pattern as the discriminator between SARS and non-SARS is independent of protein identities. The origins and full identities of the discriminating biomarkers are under investigation. To know their identities for the purpose of differential diagnosis is not absolutely required, as shown by numerous studies showing diagnosis of cancers by SELDI methods. However, to characterize these peaks would certainly help in understanding the biological roles of these peptide/proteins and could potentially lead to the discovery of more direct diagnostic tools and novel therapeutic targets for SARS-CoV.
| Acknowledgments |
|---|
| Footnotes |
|---|
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
D. Nedelkov, U. A. Kiernan, E. E. Niederkofler, K. A. Tubbs, and R. W. Nelson Population Proteomics: The Concept, Attributes, and Potential for Cancer Biomarker Research Mol. Cell. Proteomics, October 1, 2006; 5(10): 1811 - 1818. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. T.K. Pang, T. C.W. Poon, K.C. A. Chan, N. L.S. Lee, R. W.K. Chiu, Y.-K. Tong, S. S.C. Chim, J. J.Y. Sung, and Y.M. D. Lo Serum amyloid a is not useful in the diagnosis of severe acute respiratory syndrome. Clin. Chem., June 1, 2006; 52(6): 1202 - 1204. [Full Text] [PDF] |
||||
![]() |
R. T.K. Pang, T. C.W. Poon, K.C. A. Chan, N. L.S. Lee, R. W.K. Chiu, Y.-K. Tong, R. M.Y. Wong, S. S.C. Chim, S. M. Ngai, J. J.Y. Sung, et al. Serum Proteomic Fingerprints of Adult Patients with Severe Acute Respiratory Syndrome Clin. Chem., March 1, 2006; 52(3): 421 - 429. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |