|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular Diagnostics and Genetics |
1 Dental Research Institute, 2 Department of Biostatistics, 3 School of Dentistry, 4 Jonsson Comprehensive Cancer Center, 5 Division of Head and Neck Surgery/Otolaryngology, and 6 Henry Samueli School of Engineering, University of California, Los Angeles, CA; 7 AmpTec GmbH, Hamburg, Germany.
aAddress correspondence to this author at: Dental Research Institute, 73-017 Center for Health Sciences, 10833 Le Conte Ave., University of California, Los Angeles, CA 90095-1668. e-mail dtww{at}ucla.edu.
| Abstract |
|---|
|
|
|---|
Methods: We used a universal mRNA–specific linear-amplification strategy in combination with Affymetrix Exon Arrays to amplify salivary RNA from 18 healthy individuals on the nanogram scale. Multiple selected candidates were preamplified in one multiplex reverse transcription PCR reaction, cleaned up enzymatically, and validated by qPCR.
Results: We defined a salivary exon core transcriptome (SECT) containing 851 transcripts of genes that have highly similar expression profiles in healthy individuals. A subset of the SECT transcripts was verified by qPCR analysis. Informatics analysis of the SECT revealed several functional clusters and sequence motifs. Sex-specific salivary exon biomarkers were identified and validated in tests with samples from healthy individuals.
Conclusions: It is feasible to use samples containing fragmented RNAs to conduct high-resolution expression profiling with coverage of the entire transcriptome and to validate multiple targets from limited amounts of sample.
| Introduction |
|---|
|
|
|---|
To overcome these challenges, we designed an amplification strategy that does not depend solely on the 3' poly(A) tail or on any universal or gene-specific sequences. We also developed a preamplification process that overcomes several constraints of multiplex qPCR (see the workflow in Fig. 1 in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol54/issue5).
| Materials and Methods |
|---|
|
|
|---|
RNA isolation and amplification
RNA was isolated from 560 µL of saliva supernatant with the RNeasy Mini Kit (Qiagen), according to the manufacturers instructions. Inclusion of 10 mL/L of the RNase inhibitor NucleoGuard (AmpTec) in the lysis buffer improved RNA yield and recoveries of long transcripts (see Fig. 2 in the online Data Supplement). All samples were treated with TURBO DNA-free (Ambion) to remove trace amounts of genomic DNA. A 2-round amplification was performed with the ExpressArt TRinucleotide mRNA Amplification Kit (AmpTec) according to the manufacturers instructions.
microarray and data processing
Single-stranded cDNA was generated from the amplified cRNA with the WT cDNA Synthesis Kit (Affymetrix) and then fragmented and labeled with the WT Terminal Labeling Kit (Affymetrix). Samples were hybridized with GeneChip Human Exon 1.0 ST Arrays (Affymetrix) and scanned at the UCLA Microarray Core Facility. Raw data were processed with the Exon Array Computational Tool (ExACT) (Affymetrix) for background correction and normalization. Raw data files have been deposited in the Gene Expression Omnibus database (GSE7760).2
statistical analysis
Data analysis and statistical evaluations were performed with customized R codes (version 2.3.1, http://www.r-project.org/). We defined a probeset as present when it had a P value <0.001 and an Intensity value >200. These criteria were suggested by the results of preliminary experiments. The SECT includes all probesets present in at least 16 of the 18 arrays. In addition, we refined the SECT to remove GC-rich (i.e.,
80%) probesets. The concordance between the SECT and the normal salivary core transcriptome (NSCT) (see Results) was evaluated assuming hypergeometric distributions.
For qPCR data, we normalized all transcript abundances to the 3 genes in the saliva internal reference (SIR): ANXA2 (annexin A2),3
RPL37 (ribosomal protein L37), and S100A8 (S100 calcium binding protein A8) (see Supplementary Methods in the online Data Supplement). Specifically, we subtracted the mean of the SIR genes cycle threshold (CT) values from all of the raw CT values and used the resulting
CT values to represent the relative abundance of the transcripts(12).
primer design
Nested PCR assays with primer melting temperatures of about 60 °C were designed with Primer Express software (Applied Biosystems; see Table 1 in the online Data Supplement). The amplicons were intron spanning whenever possible. Amplicon lengths were 100–130 bp for the outer primer pairs used in preamplification and 60–80 bp for the inner primer pairs used in qPCR analyses.
RT-PCR preamplification
Multiplex RT-PCR preamplifications were performed in 10-µL reaction volumes with a pool of outer primers at 300 nmol/L each and the SuperScript III Platinum One-Step qRT-PCR System (Invitrogen). Reactions were prepared on ice, loaded into a preheated thermocycler, and performed as follows: 1 min at 60 °C, 15 min at 50 °C, 2 min at 95 °C, and 15 cycles of 15 s at 95 °C, 30 s at 50 °C, 10 s at 60 °C, and 10 s at 72 °C. These steps were followed with a final extension of 5 min at 72 °C and cooling to 4 °C.
Immediately after the RT-PCR, we treated 5 µL of the reaction with 2 µL of ExoSAP-IT (USB Corporation) for 15 min at 37 °C to remove excess primers and deoxynucleoside triphosphates and then heated the mixtures to 80 °C for 15 min to inactivate the enzyme mix. The preamplification products were then diluted 40-fold with water to 200 µL to enable qPCR analysis of all targets.
quantitative PCR
Each transcript was quantified from 2 µL aliquots of preamplified samples via a singleplex qPCR in an SDS 7500 Fast instrument (Applied Biosystems) with a 10-µL reaction volume containing 300 nmol/L of each of the inner primers and the SYBR Green Power Master Mix (Applied Biosystems). After 10 min of polymerase activation at 95 °C, we carried out 40 cycles of 15 s at 95 °C and 60 s at 60 °C and then performed a melting curve analysis.
| Results |
|---|
|
|
|---|
|
exon array profiling
We hybridized the amplified RNA samples to the Affymetrix GeneChip Human Exon 1.0 ST platform, which interrogates 1 x 106 exon clusters and provides high resolution and full-length coverage for the detection of all currently annotated and predicted transcripts. Recent studies have demonstrated high concordance between the standard expression HG-U133 platform and the Exon Array platform(13)(14). Raw microarray data were processed for background correction and quantile normalization. With the output P values (measuring the probability of the presence of specific signals) and fluorescence intensities, we defined a probeset as present when it met our quality-control criteria (see Materials and Methods). With these filter criteria and this stringency for probeset selection, all probesets that were present on >85% of the arrays defined the initial SECT, which contains 1534 probesets representing 976 unique genes. As quality control and to evaluate biological variation, we calculated the Pearson correlation coefficient of the intensities from all arrays for each of the 285 pairs of adjacent probesets found in the SECT, which were likely to be from the same transcript fragments. As expected, R2 was >0.7 for 90% of these probeset pairs and >0.9 for 60% of the pairs, compared with randomly selected probeset pairs from the SECT (Fig. 2
).
|
preamplification assay
Our aim was to develop a method that permits a one-step RT-PCR preamplification of multiple targets in a single reaction and a subsequent unbiased qPCR analysis. The inclusion of an enzymatic cleanup to eliminate primer carryover ensures quantitative analysis of the preamplification products. We selected 10 candidates from the previously defined NSCT that could serve as SIRs and designed semi-nested primer sets(5). The outer primer pairs for all targets were combined for preamplification into one multiplex reaction.
Specificity.
Simply adding an aliquot of the preamplification reaction (before clean-up) to qPCR reactions failed. An examination of the melting curves revealed the presence of nonspecific products that made accurate quantification impossible (see Fig. 5, A and B, in the online Data Supplement). This problem was due to excessive carryover of primers from the preamplification that were also amplifiable during the qPCR. Digestion of the primers with exonuclease effectively prevented the formation of nonspecific products, and we detected only specific products after this treatment (see Fig. 5, C and D, in the online Data Supplement). Although the semi-nested approach yielded good results, we decided to use fully nested designs for future assays to further increase the specificity, especially in higher multiplexes. The exonuclease digestion was combined with alkaline phosphatase digestion of deoxynucleoside triphosphates to establish equivalent reaction conditions for the downstream qPCR.
Linearity.
mRNA calibrators for 10 SIR candidates were transcribed in vitro and mixed at equal concentrations (see Supplementary Methods in the online Data Supplement). We then used tRNA to establish and preamplify a dilution series of this mixture. Subsequent analysis of the calibration curve indicated equal and optimal RT-PCR efficiencies (Fig. 3A
; see Table 2 in the online Data Supplement). We similarly preamplified and analyzed reference RNA dilutions between 100 ng and 6.1 pg. The gene expression pattern was conserved over 4 orders of magnitude of RNA input (Fig. 3B
). In combination with the approximately 100-fold range of the target-transcript concentrations, this amounts to a dynamic range of more than 6 orders of magnitude. To demonstrate that the linearity of the preamplification is independent of the number of PCR cycles, we preamplified reference RNA with 5, 14, and 20 iterations. The linear relationship between the logarithm of the template input and the CT is conserved from 5–20 cycles (Fig. 3, C and D
; see Table 2 in the online Data Supplement).
|
To expand the applicability of the multiplex preamplification approach, we also have successfully used this approach with reactions of greater than 60-plex with fully nested assays (data not shown).
internal normalization
The preamplification approach offers the possibility of including internal reference genes, which can normalize the quantity of the target to that of a biologically stable entity in the sample and correct for variations in reverse transcription efficiency between reactions. This possibility was demonstrated in a 26-plex experiment in which we normalized the CT values of individual mRNAs with the arithmetic mean of the 3 selected SIR genes (Fig. 4
; see Supplementary Methods in the online Data Supplement)(12). For this method to be valid, the
CT value between the target and normalizer genes needs to be stable and independent of sample input(15). We assessed this criterion by analyzing the regression of
CT against RNA input, which showed that
CTs were indeed stable (see Table 3 in the online Data Supplement). Consequently, by applying this normalization procedure, we obtained identical expression profiles for total-RNA input amounts between 100 ng and 6.1 pg. We noticed that the SDs for the
CTs of replicates were markedly reduced compared with the CT values as an effect of performing reverse transcription in the same reaction (see Table 3 in the online Data Supplement). The SDs for measurements of different RNA input concentrations were similar to those of inputs of the same concentration, indicating no concentration dependence.
|
Qpcr validation
To verify the actual presence of the SECT exons, we randomly picked 36 probesets and performed a qPCR analysis with the original unamplified salivary RNA samples. We designed nested primers to target the corresponding probeset sequences, performed multiplex preamplification followed by qPCR quantification, and normalized all CT values to the SIR genes. We detected 32 probesets, representing 28 genes, in essentially all of the samples, suggesting that the SECT represents common salivary transcript fragments. Interestingly, the relative-expression profiles of these probesets as represented by the array signals and the
CT values were correlated (mean R2 = 0.67), a result that demonstrates the fidelity of the amplification procedure (Fig. 5
).
|
The 4 probesets that failed in the validation testing were found to be highly GC rich, suggesting that the confidence in detecting these high-affinity probesets was overestimated because of inadequate probe-level normalization(16). Therefore, we removed highly GC-rich (
80%) probesets and refined the SECT to a high-confidence set of 1370 probesets (851 genes; see Table 4 in the online Data Supplement). A comparison of these 851 genes with the 185 genes in the NSCT revealed 125 genes in common with a high level of concordance (P < 0.001), and this result allowed us to define an expanded set of 726 genes. An examination of the positions along the transcripts of these SECT-specific probesets derived from the expanded set of 726 genes showed a relatively even distribution with some 5'-end enrichment. This distribution is significantly different from that of the probeset positions for the 125 genes in common with the NSCT, which showed enrichment at the 3' end (P < 0.001; see Fig. 6
and Supplementary Methods in the online Data Supplement).
|
mechanistic and biological implications of sect
Functionally classifying and clustering the SECT mRNAs provide important mechanistic insights into their origin and function. We searched the DAVID database(17) for enrichment of gene annotations as well as annotation clusters. Not surprisingly, many Gene Ontology functional terms enriched in previous salivary transcriptome and proteome studies(4)(5) were also well represented in the SECT; these terms included the ribosome, transportation, nucleic acid binding, and immune and defense responses (see Table 5 in the online Data Supplement). We observed several annotation clusters that were tightly linked to the biological roles of saliva. For example, a cluster of cellular-defense responses, including inflammatory response, immune response, pest and parasite response, and stress response, corresponds well with the antimicrobial and defensive roles of saliva. Such annotation clusters also include cell motility, epidermis development, regulation of cell proliferation and apoptosis, and RNA metabolism. Interestingly, nearly 8% of the SECT-defined genes were associated with disease mutation (UniProt Knowledgebase), suggesting a diagnostic link for the salivary transcriptome.
Previous studies have suggested that certain macromolecules that associate with subsets of mRNAs confer stability to RNA in saliva(18). We have also found that many NSCT transcripts contain a sequence motif with a common AU-rich element (ARE), which typically recruits ARE-binding proteins to control transcript stability(19)(20)(21). In light of these findings, we retrieved the SECT probeset sequences and used the Motif Discovery scan algorithm (MDscan) to search for sequence motifs(22) at different motif lengths. Interestingly, we found many U-rich and AU-rich sequences (Fig. 6A
), suggesting the involvement of ARE-binding proteins.
salivary exon biomarker for distinguishing the sexes
Finally, we explored the potential of the salivary exon transcriptome to differentiate clinical phenotypes. Because our study participants (7 males and 11 females) were all healthy and given that many mammalian tissues exhibit sex-specific patterns of gene expression(23), we tested the utility of the exon profiles to identify an individuals sex. We selected exon candidates that included probesets present in >85% of the individuals of one sex and in <15% of the other and probesets that showed significant differences (P < 0.05; 3-fold change minimum) in intensity between the sexes. We selected 3 salivary exon candidates for validation and performed multiplex preamplification followed by qPCR with the original unamplified RNA samples. Two candidates, representing genes RPS4Y1 (ribosomal protein S4, Y-linked 1) and EIF1AY (eukaryotic translation initiation factor 1A, Y-linked), were detected with reliable signals in all male samples, whereas transcripts for these 2 genes were undetected in all but one female sample. We built a logistic regression model that combined the qPCR analyses of these 2 probesets, and ROC curve analysis of all of the original samples yielded a value for the area under the curve of 0.987 (Fig. 6B
). Of note is that these 2 probesets are Y-chromosome exons, which are expected to be absent in all females. We further evaluated the model with an independent cohort of 28 individuals (15 males and 13 females). The only sample that was incorrectly scored was from a female who was positive for the male-specific exon markers.
| Discussion |
|---|
|
|
|---|
The main feature of the amplification method we have described is reverse transcription with a mix of anchored oligo(dT) and TRinucleotide primers, which are composed of an artificial 5' sequence, a central short run of 3–6 random nucleotides for stabilizing primer annealing, and a selected 3'-terminal trinucleotide sequence that controls primer hybridization. This approach produces nonrandomly spaced primer-annealing sites, rendering a selective advantage to primer positions near the templates 3' end (data not shown), likely due to low-affinity and transient binding of polymerases to the free 3' ends of nucleic acids. The mild requirement of a matching trinucleotide sequence in the template makes it essentially universal, yet selective against compact structures in rRNAs (data not shown). The trinucleotide primers are used again for second-strand DNA synthesis. Therefore, all T7-amplified RNAs contain the same sequence at the 3' terminus, which allows for stringent priming in reverse transcription in additional amplification rounds.
Our study is among the first systematic surveys of profiles of gene expression in clinical samples that has combined universal full-length linear amplification of mRNA, a microarray platform with exon-level resolution, and comprehensive qPCR validation. Compared with methods used in previous studies, the comprehensiveness and advantages of our approach are demonstrated by our detection of many new transcript fragments. Many of these new transcripts were detected as 5'-end fragments that would be ignored in 3'-biased approaches. In support of this finding, independent qPCR validation has demonstrated that the amplification is specific for mRNA fragments in saliva samples. With our refined criteria of presence calling, qPCR analysis verified >90% of the randomly selected probesets. In addition, the array signals and the qPCR results showed good correlation for the amplified product and the starting RNA (Fig. 5
).
It is important to note that validation of the large number of targets that we analyzed for the limited amounts of samples that were available would have been impossible without the preamplification. The multiplexible preamplification allows for accurate and easy quantification of mRNA. We demonstrated that the use of sequence-specific priming and limited numbers of PCR cycles allowed all targets to be reverse-transcribed and amplified with optimal, stable efficiencies. The use of target-specific primers for reverse transcription that are in close proximity to the qPCR target sequence is more favorable than the use of random or poly(dT) priming. This method largely resolves the limitations that have been observed with qPCR analysis of partially degraded RNA samples.
A few factors have been implicated to affect protein binding to sequence motifs. These factors include the number of motifs, their spacing, the extent of degeneration, and the presence of other protein partners. Therefore, the mere presence of sequence motifs may not be sufficient to direct protein binding. Nonetheless, the prevalence of the AU-rich sequences in the SECT transcripts serves as an indication of their biological importance. Because salivary glands are known to express ARE-binding proteins(28), these findings suggest a mechanism in which ARE-containing mRNA sequences in the saliva are produced locally and that these ARE-binding proteins (such as HuR) confer stability(29).
We have demonstrated the clinical potential of saliva diagnostics with enhanced confidence, sensitivity, and resolution. Saliva is a noninvasively and easily collected diagnostic fluid, and saliva testing holds promise for the early detection of disease, prognostic prediction, and, ultimately, health surveillance. Our demonstration of the ability to distinguish an individuals sex from saliva exon profiles provides early evidence that salivary exon profiles may have clinical value in disease diagnosis. This study has provided a comprehensive approach that is also applicable for biomarker studies of other body fluids and clinical samples containing fragmented RNAs.
| Acknowledgments |
|---|
Financial Disclosures: G.K. is affiliated with AmpTec, which has commercialized the amplification technique. D.T.W. and B.G.Z. have filed a patent on the multiplex preamplification with the cleanup process. The rest of the authors declare no conflicts of interest. This study was conducted solely in the laboratory of D.T.W. at UCLA.
Acknowledgments: We thank K. Brown from Affymetrix, Inc., and the UCLA DNA Microarray CoreFacility for supporting the microarray studies. We thank V. Palanisamy for comments and discussion. We thank Y. Kim, N. Park, and S. Hu for critical review of the manuscript.
| Footnotes |
|---|
2 All microarray raw data have been deposited in the Gene Expression Omnibus (GEO) database under series accession no. GSE7760 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=bjupxqagoauqgxq&acc=GSE7760; expected release date: May 20, 2008). ![]()
3 Human genes: ANXA2, annexin 2; RPL37, ribosomal protein L37; S100A8, S100 calcium binding protein A8; RPS4Y1, ribosomal protein S4, Y-linked 1; EIF1AY, eukaryotic translation initiation factor 1A, Y-linked; ACTB, actin, beta; CSTA, cystatin A (stefin A); RPS10, ribosomal protein S10; RPS16, ribosomal protein S16; RPS6, ribosomal protein S6; RPS7, ribosomal protein S7; DUSP1, dual specificity phosphatase 1; H3F3A, H3 histone, family 3A; IL1B, interleukin 1, beta; IL8, interleukin 8; OAZ1, ornithine decarboxylase antizyme 1; S100P, S100 calcium binding protein P; SAT1, spermidine/spermine N1-acetyltransferase 1; MUC7, mucin 7, secreted; ANXA1, annexin A1; CXCL1, chemokine (C-X-C motif) ligand 1 (melanoma growth stimulating activity, alpha); EGR1, early growth response 1; FOS, v-fos FBJ murine osteosarcoma viral oncogene homolog; SPRR1A, small proline-rich protein 1A; IL1RN, interleukin 1 receptor antagonist; SPRR3, small proline-rich protein 3; CSTB, cystatin B (stefin B); S100A14, S100 calcium binding protein A14; ARF5, ADP-ribosylation factor 5; IER3, immediate early response 3; KRT4, keratin 4; CRNN, cornulin; ITGB2, integrin, beta 2 (complement component 3 receptor 3 and 4 subunit); PI3, peptidase inhibitor 3, skin-derived (SKALP); MT-ND5, mitochondrially encoded NADH dehydrogenase 5; IVNS1ABP, influenza virus NS1A binding protein; PRB4, proline-rich protein BstNI subfamily 4; MT-ND4, mitochondrially encoded NADH dehydrogenase 4; RNR2, RNA, ribosomal 2; G0S2, G0/G1switch 2; B2M, beta-2-microglobulin; MT-ND3, mitochondrially encoded NADH dehydrogenase 3; MT-ND1, mitochondrially encoded NADH dehydrogenase 1; SPRR2A, small proline-rich protein 2A. ![]()
4 These authors contributed equally to this work. ![]()
| References |
|---|
|
|
|---|

CT method. Methods 2001;25:402-408.[CrossRef][Web of Science][Medline]
[Order article via Infotrieve]
-dependent phosphorylation of the mRNA-stabilizing factor HuR: implications for posttranscriptional regulation of cyclooxygenase-2. Mol Biol Cell 2007;18:2137-2148.The following articles in journals at HighWire Press have cited this article:
![]() |
F. Wei, P. Patel, W. Liao, K. Chaudhry, L. Zhang, M. Arellano-Garcia, S. Hu, D. Elashoff, H. Zhou, S. Shukla, et al. Electrochemical Sensor for Multiplex Biomarkers Detection Clin. Cancer Res., July 1, 2009; 15(13): 4446 - 4452. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |