|
|
||||||||
Reviews |
1 Microarray Centre, Clinical Genomics Center, University Health Network, Toronto, Ontario, M5G 1L7 Canada.
2 Ontario Cancer Institute, Princess Margaret Hospital, University Health Network, Toronto, Ontario, M5G 2L9 Canada.
Departments of Medical
3
Biophysics and
4
Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, M5G 2L9 Canada.
aAuthor for correspondence. Fax 416-946-2065; e-mail jeremy.squire{at}utoronto.ca.
| Abstract |
|---|
|
|
|---|
| Microarray Technology |
|---|
|
|
|---|
|
manufacturing of microarrays
Spotted arrays are manufactured using xyz robots that use hollow pins to deposit cDNA (PCR products) or short oligonucleotides onto specially coated glass microscope slides (4). Spot sizes range between 80 and 150 µm in diameter, and arrays that contain up to 80 000 spots can be obtained. Gene sequences to be arrayed are selected from several public databases, which contain resources to access well-characterized genes and expressed sequence tags (ESTs)1
representative of genes of unknown function. The clones chosen are amplified from appropriate cDNA libraries by PCR and purified before spotting on the solid support.
In addition to their lower price and flexibility in design, spotted arrays offer the advantage of allowing the simultaneous expression analysis of two biological samples, such as test and control samples. This direct comparison of expression profiles of two biological samples, such as untreated cells compared with treated cells or healthy tissue compared with cancer, is an enormous advantage for any pairwise analysis. Furthermore, because these arrays can be spotted with thousands of sequenced expressed genes and ESTs of unknown function, they offer the potential for the discovery of new genes and defining their role in disease. One disadvantage of spotted arrays is that they provide information only on the relative gene expression between specific cells or tissue samples as opposed to direct quantification of RNA expression.
Affymetrix GeneChipsTM are produced by synthesizing tens of thousands of short oligonucleotides in situ onto glass wafers, one nucleotide at a time, using a modification of semiconductor photolithography technology (1)(5). Generally, GeneChips are designed with 1620 oligonucleotides representing each gene on the array. Each oligonucleotide on the chip is matched with an almost identical one, differing only by a central, single base mismatch. This allows determination of the degree of nonspecific binding by comparison of target binding intensity between the two partner oligonucleotides. The main advantage of Affymetrix GeneChips is their ability to measure the absolute expression of genes in cells or tissues. Their disadvantages, in addition to their higher costs, include their current inability to simultaneously compare, on the same array, the degree of expression of two related biological samples. In addition, oligonucleotide-based microarrays require a priori knowledge of the gene sequences and require complex computational manipulation to convert the 40 feature signals into an actual expression value. More recently, oligonucleotide arrays have been developed that combine some the flexibilities and qualitative advantages associated with the use of synthetic probe arrays with the benefits of simultaneous analysis afforded by spotted glass array (6).
In our laboratories, we use the cDNA microarrays spotted with 1700 or 19 200 genes and ESTs manufactured at the University Health Network Microarray Centre (http://www.microarrays.ca) to study tumor progression and patient response to treatment in several human solid tumors.
experimental design and choice of reference
Careful design at the outset is crucial to the success of microarray experiments. In cancer research, casecontrol, blocked, and random profile designs predominate. In a casecontrol study, two samples from a single individual, e.g., tumor tissue and healthy tissue, are compared directly. Because patient variability and genetic heterogeneity are key issues in microarray data analysis, the casecontrol design is an excellent solution when feasible.
Blocked designs are typically used to study the effect of a treatment or growth condition on a sample such as a cell line. They have been successfully used to examine cell lines grown under different conditions (e.g., cultured in the presence or absence of an anticancer drug) or different related cell lines (e.g., wild type vs mutant, nontransfected cells vs transfected cells). Random profile designs are widely used in microarray experiments when cell lines or patient samples are selected and profiled. Most of the "profiling papers" have used this design, which offers the ability to use data from many different individuals but offers no intrinsic control for bias in the patient populations or cell populations used.
In both the blocked and randomized profile designs, the sample is typically compared with a common or "universal" reference, which should have adequate representation of the majority of genes on the array being profiled and be easily available. Commercially available reference RNA is often a good choice because of wide gene representation (e.g., Stratagene and Clontech). The use of a common reference also offers the advantage of allowing longitudinal comparative analysis among several microarray projects between different research groups interested in a common aspect of cancer research, such as tumor progression or resistance to anticancer drugs. We have recently used a pool of 9 cell lines to establish the expression profiles of a series of 15 ovarian cancer samples (7).
The importance of replicates cannot be overemphasized because variability can be very high in microarray experiments. Many groups, including ours, also choose to carry out so-called "dye reversals", in which one replicate array is hybridized with the experimental sample labeled with one fluorophore and the reference sample with the other dye. The corresponding duplicate array is then hybridized with experimental samples and reference samples labeled with the opposite fluorophores. This strategy generates replicate data while balancing the possible differential efficiency of dye incorporation among RNA samples.
target preparation and hybridization
Both total RNA and mRNA can be used for microarray experiments and allow the attainment of high-quality data with a high degree of confidence. High-quality RNA is crucial for successful microarray experiments. Different standard RNA extraction methodologies have been used successfully, and the choice of protocol is largely a question of personal experience. Quantitative and qualitative evaluation of the RNA obtained can be carried out by standard techniques, such as agarose gel electrophoresis, but is limited by the relatively large amounts of sample required. More recently, assessment of RNA quality and quantity has been greatly facilitated by the use of microcapillary-based devices such as the Agilent Bioanalyzer (Agilent Technologies), which can be used with as little as 5 ng of total RNA.
One of the current limitations in the routine application of microarray technology to patient samples is sufficient RNA availability. Thus, there has been considerable interest in the development of RNA amplification strategies that facilitate RNA extraction from laser capture microdissected (LCM) samples, such as fine-needle biopsies. For standard microarray experiments, the isolated RNA is reverse-transcribed into target cDNA in the presence of fluorescent (generally Cy3-dNTP or Cy5-dNTP) or radiolabeled deoxynucleotides ([33P]- or [32P]-
-dCTP). After purification and denaturation, the labeled targets are hybridized to the microarrays at a temperature determined by the hybridization buffer used. After hybridization, the arrays are washed under stringent conditions to remove nonspecific target binding and are air-dried.
image acquisition and quantification
Microarray image processing uses differential excitation and emission wavelengths of the two fluors to obtain a scan of the array for each emission wavelength, typically as two 16-bit grayscale TIFF images. These images are then analyzed to identify the spots, calculate their associated signal intensities, and assess local background noise. Most image acquisition software packages also contain basic filtering tools to flag spots such as extremely low-intensity spots, ghosts spots (where background is higher than spot intensity), or damaged spots (e.g., dust artifacts). These results allow an initial ratio of the evaluated channel/reference channel intensity to be calculated for every spot on the chip. The products of the image acquisition are the TIFF image pairing and a quantified data file that has not yet been normalized. An excellent assessment of different image analysis methods can be found at http://oz.berkeley.edu/tech-reports/.
databases and normalization
The quantity of data generated in a microarray experiment typically requires a dedicated database system to store and organize the microarray data and images. The first role of a local microarray database is the storage and annotation (description of experimental parameters) of microarray experiments by the investigator who designed and carried out the microarray experiments. In addition, there is currently an increasing global interest in making microarray data sets publicly available in a standardized format. This would allow other investigators to reproduce published microarray experiments, to thereby independently verify them, to compare data sets across different microarray platforms, and importantly, to interrogate published microarray data sets by use of various bioinformatics tools to explore different biological problems. To answer this need, the Minimal Information about a Microarray Experiment, or MIAME standard, has been proposed by the MGED (http://www.mged.org) organization as a series of criteria that should be used when defining microarray experiment parameters. In our group, we enter all microarray data into a local microarray database (GeneTraffic; Iobion Informatics), which holds all of the microarray data files and TIFF images, as well as a MIAME supportive annotation of our experiments.
Once data have been loaded into the database, they are normalized, and aggregate statistics are calculated. Normalization is a process that scales spot intensities such that the normalized ratios provide an approximation of the ratio of gene expression between the two samples. Discussion of the different strategies for normalization of microarray data is beyond the scope of this review article, but the choice of a robust and adequate normalization method is as crucial for the quality of the data obtained as the experimental design of the microarray experiment itself. A discussion of normalization methods is provided in supplementary materials to this article (www.utoronto.ca/cancyto/CLINCHEM).
statistical analysis and data mining
Analysis of large gene expression data sets is a new area of data analysis with its own unique challenges. Data mining methods typically fall into one of two classes: supervised and unsupervised. In unsupervised analysis, the data are organized without the benefit of external classification information. Hierarchical clustering (8), K-means clustering (9)(10), or self-organizing maps (11) are examples of unsupervised clustering approaches that have been widely used in microarray analysis (8)(12)(13)(14)(15).
Supervised analysis uses some external information, such as the disease status of the samples studied. Supervised analysis involves choosing from the entire data set a training set and a testing set and also involves construction of classifiers, which assign predefined classes to expression profiles. Once the classifier has been trained on the training set and tested on the testing set, it can then be applied to data with unknown classification. Supervised methods include k-nearest neighbor classification, support vector machines, and neural nets. Golub et al. (16) used a k-nearest neighbor strategy to classify the expression profiles of leukemia samples into two classes: acute myeloid leukemia and acute lymphocytic leukemia. Recently Su et al. (17) used large-scale RNA profiling and supervised machine learning algorithms to construct a molecular classification for 10 carcinomas (prostate, lung, ovary, colorectum, kidney, liver, pancreas, bladder/ureter, and gastroesophagus). Similarly, neural network analysis has been used by Khan et al. (18) to delineate consistent patterns of gene expression in cancer.
Tusher et al. (19) recently proposed a strategy called SAM (significance analysis of microarrays), which allows the determination of significantly differentially expressed genes between groups of samples analyzed by expression arrays. We have used this approach to narrow down the analysis to a subset of genes that were also shown to be differentially expressed when analyzed by conventional two-dimensional hierarchical clustering. As discussed below, we have recently identified genes that show differential expression between early-stage epithelial ovarian cancer (EOC), late-stage EOC, and healthy ovary (Fig. 2
).
|
| Expression Profiling Applied to Cancer Biology |
|---|
|
|
|---|
molecular tumor classification
Improvements in tumor classification are central to the development of novel and individualized therapeutic approaches. Histologically indistinguishable tumors often show significant differences in clinical behavior, and subclassification of these tumors based on their molecular profiles may help explain why these tumors respond so differently to treatment. In a landmark study, Golub et al. (16) applied microarray technology to develop innovative classifications of leukemias, using microarray analysis based on "neighborhood analysis" and the utilization of tumor class predictors. This strategy was able to distinguish between acute myeloid leukemia and acute lymphocytic leukemia without supervisory analysis. Other groups have also used gene expression pattern analysis to classify, at the molecular level, breast tumors (20)(21), B-cell lymphoma (14), cutaneous melanoma (22), and lung adenocarcinoma (23)(24). Likewise, in a recent study analyzing molecular profiles of 50 nonneoplastic and neoplastic prostate samples, Dhanasekaran et al. (25) established signature expression profiles of healthy prostate, benign prostatic neoplasia, localized prostate cancer, and metastatic prostate cancer. These studies established the feasibility of combining large-scale molecular analysis of expression profiles with classic morphologic and clinical methods of staging and grading cancer for better diagnosis and outcome prediction.
drug sensitivity
Despite considerable advances in cancer treatment, acquired resistance to chemotherapeutic drugs continues to be a major obstacle in patient treatment and overall outcome. Anticancer drug resistance is thought to occur through numerous mechanisms, and microarrays offer a new approach to studying the cellular pathways implicated in these mechanisms and in predicting drug sensitivity and unexpected side effects. Most array studies have been carried out using cancer cell lines that are rendered resistant to commonly used anticancer drugs. For example, Kudoh et al. (26) monitored the expression profiles of doxorubicin-induced and -resistant cancer cells in an attempt to obtain molecular fingerprinting of anticancer drugs in cancer cells. Scherf et al. (27) analyzed a subset of 1400 genes from a study reported by Ross et al. (28) and studied the correlation between expression profiles and drug mechanism of action of a panel of 118 anticancer drugs. Obtaining further insights into the mechanism of action of anticancer drugs and the diverse pathways involved in drug resistance may eventually be invaluable for design of more strategic treatments that are most appropriate for an individual tumor.
identification of tumor-specific molecular markers
Several research groups have focused on identifying subsets of genes that show differential expression between healthy tissues or cell lines and their tumor counterparts to identify biomarkers for several solid tumors, including ovarian carcinomas (7)(29)(30)(31)(32), oral cancer (33), melanoma (34), colorectal cancer (35), and prostate cancer (36). In our recent study (7) carried out on a cohort of 13 patients with EOC, we identified a subset of genes that show differential expression between healthy ovaries and ovarian tumors (Fig. 2
). Some of these genes, such as metallothionein 1G, which was found to be up-regulated in tumor samples, are implicated in resistance to the anticancer drug cisplatin and might be an indicator of pretreatment resistance of these tumors to cisplatin. Other genes identified in our study, such as the osteopontin gene, which was strongly up-regulated in some tumors samples and which has been shown (37) to be secreted in the serum of patients with metastatic cancer, might be an excellent candidate for biomarkers of tumor progression in EOC. One of the most important challenges facing investigators using microarray analysis is determining which of the plethora of new differentially expressed genes is biologically relevant to the tumor system being studied. Even when rigorous efforts are made to minimize the number of variables in a microarray study, there may be an unmanageable number of differentially expressed genes that will contribute excessive background values. Therefore, combining expression microarray analysis with other approaches, particularly cytogenetics techniques, such as spectral karyotyping and chromosome and array comparative genomic hybridization (CGH) (2), offers the possibility to focus on significantly smaller subsets of genes of direct relevance to tumor biology (7). Monni et al. (38) and Barlund et al. (39) recently used a combination of expression arrays and CGH array techniques on breast cancer cell lines and have identified a limited number of genes that are both amplified and overexpressed. [For a review, see Monni et al. (40), as illustrated in Fig. 3
].
|
Finally, validation of the relative expression obtained from genome-wide microarray analysis is critical. Several approaches can be chosen, from basic Northern analysis or semiquantitative reverse transcription-PCR to in situ hybridization (ISH) using tissue microarrays. Mousses et al. (41) recently analyzed the expression of several candidate genes associated with prostate cancer that they had previously identified by cDNA microarray analysis. Tissue microarrays constructed from 544 histologic biopsies were analyzed by ISH using RNA probes and/or by immunohistochemistry (IHC) using antibodies. There was excellent correlation between the cDNA microarray results and the results obtained with ISH and Northern blot analysis. In addition, protein expression assessed by IHC was also consistent with RNA expression. Similarly, Dhanasekaran et al. (25) used comparable technologies to confirm overexpression of hepsin and PIM-1 in prostate cancer (Fig. 4
).
|
practical and future applications of microarray technology
The numbers of microarray-based studies identifying new genes or molecular pathways involved in tumor classification, cancer progression, or patient outcome are growing exponentially. We are now approaching what is being referred to as the "postgenomic era", during which the diagnostic, prognostic, and treatment response biomarker genes identified by microarray screening will be interrogated to provide personalized management of patients. Clinicians will be able to use microarrays during early clinical trials to confirm the mechanisms of action of drugs and to assess drug sensitivity and toxicity. Coupled with more conventional biochemical analysis such as IHC and ELISA, microarrays will be used for diagnostic and prognostic purposes. A recent example of such a potential "bench to bedside" translation was published by Kim et al. (42). The osteopontin gene, which encodes a calcium-binding glycophosphoprotein, had been identified by cDNA microarray analysis as being up-regulated in ovarian cancer (43). In their study, Kim et al. (42) showed that screening of plasma samples from ovarian cancer patients revealed that osteopontin protein concentrations in plasma were significantly higher in a majority of patients with ovarian cancer compared with healthy controls. This study demonstrated the potential value of cDNA microarray analysis in identifying biomarker genes in cancer and the feasibility of subsequently testing these genes at the protein level by conventional biochemical assays. Although the major limiting factors for routine use in a clinical setting at present are cost and access to the microarray technology, it is likely that costs will decrease in the near future and that the technology will become increasingly user friendly and automated.
| Conclusion |
|---|
|
|
|---|
| Acknowledgments |
|---|
| Footnotes |
|---|
| References |
|---|
|
|
|---|
The following articles in journals at HighWire Press have cited this article:
![]() |
M. J. Duffy Predictive Markers in Breast and Other Cancers: A Review Clin. Chem., March 1, 2005; 51(3): 494 - 503. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Maekawa, T. Nagaoka, T. Taniguchi, H. Higashi, H. Sugimura, K. Sugano, H. Yonekawa, T. Satoh, T. Horii, N. Shirai, et al. Three-Dimensional Microarray Compared with PCR-Single-Strand Conformation Polymorphism Analysis/DNA Sequencing for Mutation Analysis of K-ras Codons 12 and 13 Clin. Chem., August 1, 2004; 50(8): 1322 - 1327. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Marchetti, F. Barassi, C. Martella, A. Chella, S. Salvatore, A. Castrataro, F. Mucilli, R. Sacco, and F. Buttitta Down Regulation of High in Normal-1 (HIN-1) is a Frequent Event in Stage I Non-Small Cell Lung Cancer and Correlates with Poor Clinical Outcome Clin. Cancer Res., February 15, 2004; 10(4): 1338 - 1343. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Floyd and T. M. Mcshane Development and Use of Biomarkers in Oncology Drug Development Toxicol Pathol, January 1, 2004; 32(1_suppl): 106 - 115. [Abstract] [PDF] |
||||
![]() |
N. A. Faustino and T. A. Cooper Pre-mRNA splicing and human disease Genes & Dev., February 15, 2003; 17(4): 419 - 437. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |