|
|
||||||||
Hematology |
1 Departamento de Bioquímica y Biología Molecular y Celular, Universidad de Zaragoza, Zaragoza, Spain.
2 Progenika Biopharma S.A., Derio, Spain.
3 Servicio de Hematología, Hospital Universitario Miguel Servet, Zaragoza, Spain.
aAddress correspondence to this author at: Departamento de Bioquímica y Biología Molecular y Celular, Universidad de Zaragoza, 50009 Zaragoza, Spain. Fax 976761236; e-mail 408861{at}unizar.es.
| Abstract |
|---|
|
|
|---|
Methods: We developed an optimized procedure for gene expression analysis based on a microarray containing 538 oligonucleotides and used this procedure to analyze neoplastic cell lines and whole-blood samples from healthy individuals and patients with different hematologic neoplasias. Hierarchical clustering and the Welch t-test with adjusted P values were used for data analysis.
Results: This procedure detects 0.2 fmol of mRNA and generates a linear response of 2 orders of magnitude, with CV values of <20% for hybridization and label replicates. We found statistically significant differences between Jurkat and U937 cell lines, between blood samples from 15 healthy donors and 59 chronic lymphocytic leukemia (CLL) samples, and between 6 acute myeloid leukemia patients and 4 myelodysplastic syndrome patients. A classification system constructed from the expression data predicted healthy or CLL status from a whole-blood sample with a 97% success rate.
Conclusion: Transcriptional profiling of whole-blood samples was carried out without any cellular or sample manipulation before RNA extraction. This gene expression analysis procedure uncovered statistically significant differences associated with different hematologic neoplasias and made possible the construction of a classification system that predicts the healthy or CLL status from a whole-blood sample.
| Introduction |
|---|
|
|
|---|
High-density microarrays that measure the expression of thousands of genes have drawbacks, such as high costs and lengthy times for data analysis and interpretation. The widely used Affymetrix technology holds the standardization advantage with respect to probes, hybridization protocols, and data quantification (6). Nevertheless, this technology has been used mainly for projects involving few samples because the high cost prohibits its use for analyzing large numbers of samples. Low-density microarrays, however, offer an inexpensive, fast, and relatively easy way to analyze gene expression (7) that is more suitable for routine applications (8).
Our aim was to analyze the transcriptional profiling of whole blood from HN patients with a low-density microarray to obtain a molecular characterization of each neoplasia.
| Materials and Methods |
|---|
|
|
|---|
blood samples
Peripheral blood samples from 15 healthy donors, 59 B-cell chronic lymphocytic leukemia (B-CLL) patients, and 13 patients with myeloid neoplasia were collected in PAXgene tubes (PreAnalytiX). All procedures were approved by the Ethics Committee for Clinical Investigation of Aragón in accordance with the Helsinki Declaration of 1975. All patients were diagnosed at the Hospital Universitario Miguel Servet of Zaragoza according to the WHO classification (1). Total RNA was extracted with the PAXgene Blood RNA Kit (Qiagen).
array description and quality controls
The array used in this work [Fundación para el Estudio de la Hematologia y la Hemoterapia en Aragón (FEHHA) Human Hematochip 8K; ArrayExpress accession no. A-MEXP-336] contains 538 probes (35-mer to 50-mer oligonucleotides) that represent 538 genes involved in cell proliferation, cell cycle activation, transcription, apoptosis, hematopoietic cell biology, leukemia, lymphoma, or cancer. The complete list of genes is available (see Table 1 in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol53/issue2). To monitor the efficiency of the complete process, we included 7 probes complementary to 7 external poly(dA) standards (external mRNA controls added to each total RNA sample before labeling) selected from Bacillus subtilis and plum pox virus genes (see Table 2 in the online Data Supplement). To monitor the hybridization and reading processes, we also included 7 probes complementary to 7 biotinylated DNA sequences [positive hybridization controls added to complementary RNA (cRNA) before hybridization] selected from Arabidopsis thaliana and Trypanosoma brucei genes (see Table 3 in the online Data Supplement). In addition, 3 probes complementary to nonhuman genes were added as controls for hybridization specificity. To measure the intraarray imprecision, we included each probe in 12-fold redundancy at different array locations. This organization generated an array of 8192 spots distributed in 32 areas, with 16 rows x 16 columns.
array fabrication
Probes were spotted onto aminosilane-coated glass slides (Corning) with a MicroGrid II 610 robotic spotter (Genomic Solutions) under controlled humidity and temperature conditions. Probes were attached to slides by cross-linking with ultraviolet radiation and baking at 80 °C.
synthesis of biotinylated CRNA and fragmentation
Single-strand cDNA was generated from 5 µg total RNA with external controls by means of a poly(dT) oligonucleotide that contains a T7 RNA polymerase-initiation site. Double-stranded cDNA was used as a template to generate biotinylated cRNA by in vitro transcription with the MEGAscript T7 High Yield Transcription Kit (Ambion) in the presence of biotin-11-UTP and biotin-11-CTP (PerkinElmer). The reaction mixture was incubated for 5 h at 37 °C. cRNA was purified with the RNeasy Mini Kit and fragmented in 40 mmol/L Tris-acetate, pH 8.1, 100 mmol/L potassium acetate, and 30 mmol/L magnesium acetate at 94 °C for 35 min.
rna yield and quality
We quantified the amount of total RNA and biotinylated cRNA by ultraviolet absorbance at 260 nm, purity by the A260/A280 ratio, and integrity by electrophoresis on a 1% agarose gel containing ethidium bromure.
array hybridization, scanning, and quantification
Ten micrograms of fragmented cRNA was denatured at 95 °C for 5 min and immediately placed on ice until hybridization. Automatic hybridization was carried out at 42 °C for 6 h in a Ventana Discovery station (Ventana Medical Systems) with ChipMap Kit hybridization buffers and the protocol for the Microarray 9.0 Europe station (Ventana Medical Systems). Arrays were stained with Cy3-conjugated streptavidin (Amersham Biosciences) and scanned with a ScanArray 4000 scanner (PerkinElmer). An image of the hybridized array is shown in Fig. 1
in the online Data Supplement. The image obtained was used to quantify fluorescence intensities with QuantArray 3.0 software (PerkinElmer). We calculated hybridization signals in each spot with the median pixel for each spot and local background correction. The trimmed mean of the hybridization signal for each probe was calculated from replicate spots.
|
data pretreatment
The mean of whole-array hybridization, the mean background signal, the CV for each replicated probe, and the values for the positive and negative controls were determined as quality-control measurements. To ensure reliable results, we set these values within a range of experimentally obtained values.
data normalization and filtering
The variance stabilization normalization (VSN) method, provided as the vsn package for R software (GNU Project), was used for normalizing the data (9). In addition, when the aim was to find significant differences between 2 groups of samples, we applied a quantile-robust normalization method, which is provided as the affy package for R software. Statistically significant probes obtained by both methods of data normalization were selected. Data were submitted to the ArrayExpress database (accession no. E-TABM-87). We applied a filter to select probes with a CV value >0.3 throughout the analyzed samples and applied a second filter to select probes with a hybridization signal greater than the arbitrary value of 550 (
2.5 times the hybridization intensity of the spotting buffer) in >25% of the analyzed samples. These functions are available in the genefilter package for R software.
statistical analysis
Statistically significant probes were determined by means of 2-sample Welch t statistics, with adjusted P values computed with step-down multiple-testing procedures (maxT) (10) or by Bonferroni correction with the multtest package for R software.
dendrogram and heatmap
The hierarchical clustering algorithm used in this study was based on the average-linkage method (11). Correlation or Pearson methods were used to calculate the distance between 2 samples. The heatmap function used was provided in the stats package for R software.
predictor
B-CLL or healthy status was predicted by means of prediction analysis of microarray (12) with the pamr package from R software. Samples were randomly divided into a training set and a test set. This method identifies a subset of probes that best characterizes each class in the training set and tests the classification rates for these selected probes.
| Results |
|---|
|
|
|---|
To assess the detection limit and the dynamic range of the complete procedure, we added known amounts (0.014 fmol) of 6 external mRNA controls to 5 µg human total RNA before the labeling reaction. The transcript concentration exhibited a linear response from 0.2 to 2 fmol (Fig. 1B
) with an r2 value of 0.95. The hybridization signal from the external mRNA controls was detected above the 550 value at 0.2 fmol of external mRNA control. Hybridization of cRNA synthesized from the same total RNA without external controls did not produce a detectable signal for any probe designed to detect the external controls (data not shown).
Reproducibility was assessed from background-corrected data without applying any preprocessing steps, such as normalization, removal of outliers, or filtering low-signal hybridization probes. To assess intraarray and interarray reproducibility, we hybridized 2 cRNA samples synthesized from total RNA from U937 and Jurkat cell lines to 5 identical arrays each, for a total of 10 hybridizations. We then calculated the CV for the 12 replicates for each of the 538 probes included in each array. The mean intraarray CV for these 10 hybridizations was 17.7%. The interarray (hybridization) reproducibility was also examined. First, we calculated a CV value for each probe across the 5 hybridized arrays with the same cRNA sample. The mean CV values for the arrays hybridized with the U937 and Jurkat cRNA samples were 6% and 9%, respectively. The mean Pearson correlation coefficient for all possible combinations of 2 arrays hybridized with the same cRNA sample was 0.99 for the U937 and Jurkat samples. Fig. 2A
presents a scatter plot that compares probe intensities on 2 arrays hybridized with the same cRNA sample.
|
To assess the imprecision included in the labeling process, we divided total RNA from the U937 cell line into 2 aliquots of 5 µg each, synthesized a biotinylated cRNA from each aliquot, and hybridized each cRNA in 4 arrays, for a total of 8 hybridizations. The mean CV value was 17%, and the mean Pearson correlation coefficient was 0.98. Fig. 2B
is a scatter plot that compares probe intensities on 2 arrays hybridized with cRNA from different labeling reactions.
We designed 3 negative-control probes and analyzed their universality by hybridizing different cRNA samples from different sources. The 3 probes showed signals of <550 (data not shown).
gene expression analysis
The unsupervised clustering of hybridization replicates from the U937 and Jurkat cell lines with 79 probes that passed the filters shows that samples were grouped on the basis of cell line origin (Fig. 3
).
|
To determine whether the observed differences were statistically significant, we applied a Welch t-test with a maxT-adjusted P value of 0.001 to both VSN-normalized and quantile-robustnormalized data. There were 67 statistically significant probes common to both sets of normalized data (see Table 4 in the online Data Supplement).
We analyzed hybridization signals from different kinds of samples to determine the extent of gene expression differences that we observed. We analyzed U937 and Jurkat cell culture samples and whole-blood samples from 10 healthy donors, 26 B-CLL patients, and 13 myeloid neoplasia patients. Unsupervised hierarchical clustering with 241 probes that passed the filters (Fig. 4
) revealed groupings that corresponded to cell culture or blood sample origin. B-CLL samples clustered together, and the remaining samples generally split into 2 main branches according to their myeloid neoplastic [AML, chronic myeloid leukemia, and myelodysplastic syndrome (MDS)] or healthy origin.
|
We subsequently compared B-CLL and myeloid neoplasia patients to determine whether the differences observed in the unsupervised analysis were statistically significant. A Welch t-test with a maxT-adjusted P value of 0.001 yielded 21 and 28 statistically significant probes from VSN-normalized and quantile-robustnormalized data, respectively. The 2 lists of statistically significant probes shared 19 probes in common. (see Table 5 in the online Data Supplement). To search for statistically significant differences within the myeloid group, we compared AML and MDS samples. The Welch t-test with a maxT-adjusted P value of 0.001 yielded no significant hybridized probes. A Welch t-test with Bonferroni correction (P <0.001) yielded lists of 8 and 16 probes from VSN-normalized and quantile-robustnormalized data, respectively. The 2 lists shared 4 probes in common (see Table 6 in the online Data Supplement). To identify statistically significant differences in whole blood between B-CLL patients and healthy donors, we compared samples from 15 healthy donors and 59 B-CLL patients. A Welch t-test with a maxT-adjusted P value of 0.001 generated 2 lists with 39 and 35 differentially hybridized probes from VSN-normalized and quantile-robustnormalized data, respectively. Thirty probes were shared by the 2 lists (see Table 7 in the online Data Supplement).
classifier
We developed a classifier system with the ability to assign a whole-blood sample to the healthy or B-CLL class from hybridization data obtained with this expression analysis procedure. The complete data set was randomly split into training (n = 30) and test (n = 44) sets. The training set, which consisted of blood samples from 10 healthy donors and 20 B-CLL patients, was used to establish the predictor. Tenfold cross-validation was used to choose an optimum probe number that minimized classification errors. The predictor based on 6 probes (ohle0351, ohle0320, ohle0374, ohle0375, ohle0009, and ohle0129; see Table 1 in the online Data Supplement) produced only 1 incorrectly classified sample in the 10-fold cross-validation procedure (Fig. 5A
). A sample was assigned to a specific class when the P value for this class was >0.5.
|
The class of an independent set of samples (test set) was determined with the predictor derived from the training set. This test set consisted of 5 healthy donor and 39 B-CLL patient samples. Forty-three (97%) of 44 test cases were correctly assigned (Fig. 5B
).
| Discussion |
|---|
|
|
|---|
We designed several control probes to survey the performance of each process and included them across the array. In addition, we established quality-control measures in each step (RNA extraction, labeling, hybridization, and reading). The hybridization step was automated to reduce experimental error, and samples were mixed during hybridization to increase fluorescence intensity and sensitivity (13).
Differences between samples in array experiments have been established as fold change values (16), but we have used a statistical test with adjusted P values to select differentially hybridized probes. We used 2 normalization methods and selected statistically different hybridized probes common to both normalized data sets to obtain a limited number of statistically significant probes not influenced by the normalization processes (17)(18).
The detection limit and dynamic range were analyzed with external controls. Because it is not possible to vary the target concentration for each of the 538 human genes included in the array or to know the behavior of the complementary probes, all of the probes designed to measure human gene expression were assumed to maintain the same behavior as the probes complementary to the external controls. The array was able to detect a transcript in the hybridization reaction at a concentration of 33 pmol/L, which is in the range of data published for other oligonucleotide arrays (19). The dynamic range was linear for >2 orders of magnitude and was in accordance with data obtained with other available arrays (13)(20). With the entire process taken into account, the linear range was <2 orders of magnitude, and the detection limit was 0.2 fmol of external mRNA control in 5 µg total RNA. The detection limit was less than that reported by others (13)(21). Different settings for the laser scanners photomultiplier (21), a longer hybridization time, or new methods of mRNA amplification could improve the detection limit and the dynamic ranges (19). Nevertheless, our data show that increases in RNA and cRNA concentration yielded proportional increases in hybridization intensity.
High intraarray reproducibility and high precision in hybridization and labeling replicates were indicated by CV values of <20% and correlation coefficients of >0.9 from nonprocessed data. These results are in agreement with data obtained from other arrays (22)(23). The low imprecision is linked to a homogeneous spot morphology, a low background, and protocol optimization.
Hybridization replicates from U937 and Jurkat cell lines were compared to evaluate whether our procedure of gene expression analysis, including the array and both protocols for sample treatment and data analysis, was capable of identifying differences in an ideal situation. Several statistically significant probes with higher hybridization signals in Jurkat cells, such as those for CD2,2 CD3E, LCK, GZMA, and CD28, represented genes related to T lymphocytes (24)(25)(26). These results are in agreement with the leukemic T cell origin of Jurkat cells. In addition, several probes representing neutrophil-associated genes, such as MNDA and LYN, showed significantly greater hybridization in U937 samples, in agreement with the monocytic origin of U937 cells. U937 cells, however, produce MYC mRNA (27), and the probe representing MYC in this array exhibited significantly greater hybridization in U937 samples.
Because the composition of whole blood is a reflection of many processes, a more complex analysis could include different types of whole-blood samples. An unsupervised analysis revealed differences according to the cell culture or blood origin of the samples, between whole-blood samples from healthy donors and HN patients, and between patients with HNs of different origins. In the comparison of B-CLL and myeloid leukemia, most of the statistically significant probes with higher hybridization signals in B-CLL samples represented B lymphocyteassociated genes, such as BTG1, CD79A, FAIM3, CCR7, CD48, and HLA-DRA (28). This result is in agreement with the observed accumulation of B lymphocytes in the peripheral blood of B-CLL patients. Several probes that showed differences between blood samples from healthy donors and B-CLL patients represented genes that have previously been described for B-CLL gene expression studies that used different expression-analysis platforms or different starting material, such as polymorphic mononucleated blood cells (29) or CD19-selected cells (30)(31). Expression of FCER2 (32)(33), CD52 (34), FAIM3(33), CCR7 (30), HLA-DRA(34), and BTG1 (29) has been associated with B-CLL. Probes that represent these genes are significantly more hybridized in B-CLL samples than in samples from healthy individuals. The observed differences may reflect differences between these groups of samples in the cellular composition of whole blood, indicating that the system we have described is able to identify relevant genes and therefore is useful for analyzing expression profiles in HN samples. We found several probes for elongation factors and ribosomal proteins with more hybridization in B-CLL samples than in healthy samples. These results do not match those of previous studies (34)(35) and may be attributable to differences in the procedures used in the gene expression analyses. Moreover, hybridization notably depends on transcript concentration, as well as on interaction affinity and probe accessibility. In addition, we found statistically significant differences between AML and MDS samples, but the probes we obtained had not previously been described and did not correlate with compared groups. Additional studies are required to confirm these results.
Also, this study presents a classification system based on hybridization intensity for predicting the healthy or B-CLL status of an unknown whole-blood sample. We classified an independent set of samples to validate the system, and our 97% success rate suggests that the set of probes extracted from this assay can predict the origin of a whole-blood sample (i.e., from a healthy individual or a B-CLL patient).
In summary, we have developed a viable procedure for gene expression analysis of HN that is based on a low-density array. We consider our results satisfactory, because we obtained technical characteristics similar to those described by other groups and to commercial arrays. We analyzed whole blood from HN patients without separating any cellular component or manipulating the samples before RNA extraction. Several results were in accordance with results in the literature or with characteristics of the compared samples. Our results support the use of this analytic procedure for the study of gene expression in HN.
| Acknowledgments |
|---|
| Footnotes |
|---|
2 Human genes: CD2, CD2 molecule; CD3E, CD3e molecule,
(CD3-TCR complex); LCK, lymphocyte-specific protein tyrosine kinase; GZMA, granzyme A (granzyme 1, cytotoxic T-lymphocyte-associated serine esterase 3); CD28, CD28 molecule; MNDA, myeloid cell nuclear differentiation antigen; LYN, v-yes-1 Yamaguchi sarcoma viral related oncogene homolog; MYC, v-myc myelocytomatosis viral oncogene homolog (avian); BTG1, B-cell translocation gene 1, anti-proliferative; CD79A, CD79a molecule, immunoglobulin-associated
; FAIM3, Fas apoptotic inhibitory molecule 3; CCR7, chemokine (C-C motif) receptor 7; CD48, CD48 molecule; HLA-DRA, major histocompatibility complex, class II, DR
; FCER2, Fc fragment of IgE, low affinity II, receptor for (CD23); CD52, CD52 molecule. ![]()
| References |
|---|
|
|
|---|
3' synthesis of complex oligonucleotide microarrays. Nucleic Acids Res 2003;31:e35.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |