Clinical Chemistry Siemens Point of Care - Urinalysis
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Clinical Chemistry 52: 2162-2164, 2006; 10.1373/clinchem.2006.072868
This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (5)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Klee, E. W.
Right arrow Articles by Klee, G. G.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Klee, E. W.
Right arrow Articles by Klee, G. G.
Related Collections
Right arrow Cancer Diagnostics (since 2002)
(Clinical Chemistry. 2006;52:2162-2164.)
© 2006 American Association for Clinical Chemistry, Inc.


Abstracts of Oak Ridge Posters

Bioinformatics Methods for Prioritizing Serum Biomarker Candidates

Eric W. Klee1,a, Judith A. Finlay2, Cari McDonald1, John R. Attewell1, Deanne Hebrink1, Roy Dyer1, Brad Love2, George Vasmatzis1, Thomas M. Li3, Joseph M. Beechem2 and George G. Klee1

1 Mayo Clinic, Rochester, MN;2 Invitrogen Corporation, Carlsbad, CA;3 IRoche Diagnostics Asia Pacific Pte Ltd., Singapore;

aaddress correspondence to this author at: Mayo Clinic, 315 Stabile Building, Rochester, MN 55904; fax 507-266-5193; e-mail klee.eric{at}mayo.edu)

A major research objective of NIH is to discover novel biomarkers that can improve cancer detection assays. Many biomarker candidates with apparent differential expression identified by high throughput genomic and proteomic experiments have been reported as candidates for novel cancer detection assays. Despite these many discoveries, few biomarkers have been validated and successfully translated into clinical tests, a situation that may be attributable to the extensive and costly experimental evaluation required to fully develop a candidate biomarker into a clinically useful assay. The use of information in addition to disease-state expression data may help to focus biomarker assay development projects more selectively and facilitate successful assay development and translation. We describe a bioinformatics and data mining method for evaluating diagnostic serum biomarker candidates by selecting genes and gene products that possess intrinsic protein localization and tissue expression properties.

The bioinformatics methods we describe are based on the assumption that candidate markers with diagnostic value have been identified previously. We defined diagnostic value as differential up-regulation or high expression of the biomarker in cancer tissue compared with benign tissue. Our experience (unpublished) has shown that detection of differentially down-regulated serum biomarker candidates is problematic because the serum biomarker candidates are usually present in low concentrations at which it is difficult to detect the absence of signal under the normal tissue background. Regardless of the definition of diagnostic value, the bioinformatics methods we describe are not contingent on discovering or confirming the informative value of the candidate biomarkers for differentiating healthy vs diseased states. These methods are designed to select candidate biomarkers with properties that make them more readily detectable in the biologically complex serum environment, in which secreted or extracellular proteins would be present at higher concentrations than proteins generally found inside of cells. Candidate biomarker sets are rapidly screened with in silico algorithms to predict which genes encode proteins that are secreted from the cell and thus are likely to be detectable in serum. Tissue-specific profiles of individual genes systematically generated from transcriptomic databases are used to select markers with expression patterns specific for the target diagnostic tissue type or to exclude markers without such expression patterns. Thus candidate markers with high signal-to-background expression distinguishable in serum are identified and provide a focused list for experimental validation.

The in silico protein localization prediction process consists of 4 publicly available (web-based) programs that identify secreted and membrane protein products and differentiate them from products that are localized to other subcellular compartments. The use of multiple prediction methods improves prediction of the expression patterns of secretory proteins that undergo signalpeptide–mediated cotranslational translocation (CTT proteins) in the endoplasmic reticulum and are then transported to the cell surface for membrane incorporation or secretion (1). We used SignalP 3.0 (2) to specifically predict secretory proteins and TargetP 1.1 (3) to differentiate mitochondrial proteins from secretory proteins. The SignalP 3.0 D-score and the TargetP predictor were combined in a consensus prediction to select proteins for further processing. We then classified the selected CTT proteins as extracellular and membrane proteins with TMHMM 2.0 (transmembrane hidden Markov model) (4), a transmembrane (TM)-domain prediction program. CTT proteins having no predicted TM domains are classified as extracellular, and CTT proteins having 2 or more predicted TM domains are classified as membrane-associated proteins. We analyzed CTT proteins predicted to have a single TM domain by Phobius (5), a combined TM topology and signal-peptide prediction program. This program distinguishes proteins with N-terminal signal anchors (membrane-associated) from proteins with N-terminal signal peptides (extracellular). We combined the secreted protein prediction methods into a batch-processing pipeline for rapid screening of all candidate biomarkers.

We evaluated the secreted protein ab initio analysis pipeline on a set of 643 genes identified from the literature as possessing diagnostic value for prostate cancer (6). We performed secreted protein prediction on all National Center for Biotechnology Information (NCBI) Reference Sequence (7) transcript variants associated with the candidates to capture genes encoding multiple protein products with differing localizations. Of the 643 putative biomarkers, 176 (27%) were predicted to encode secreted proteins. To evaluate the accuracy of prediction methods, we obtained protein records from the SwissProt protein sequence database (8) for candidate genes and cellular localization annotations abstracted from the comment field. SwissProt entries with cellular localization annotations were identified for 456 (71%) of 643 putative biomarkers, of which at least 1 protein variant of 114 genes (18% of total) was annotated as secreted, extracellular, or soluble. The prediction method and database annotation displayed concordance for 104 (91%) of 114 proteins. Of the 10 discordant predictions, 2 proteins were annotated to be secreted independently of the CTT pathway and consequently were not detectable by our prediction methods. The 72 biomarkers predicted to be secreted, but not annotated as secreted, consisted of 36 proteins lacking any SwissProt annotation, 24 proteins annotated as membrane proteins, and 12 proteins annotated as other nonmembrane CTT proteins (including Gogli, ER, lysosomal, and glycosylphosphatidylinisotol-anchor proteins).

We used tissue expression profiles to estimate the signal-to-background expression level of a candidate marker and to determine the likelihood that the marker’s diagnostic signal will be distinguishable in a serum assay. For candidate markers, we derived the relative expression pattern across different tissue types from a compendium of public transcriptomic databases, including the Cancer Genome Anatomy Project’s Serial Analysis of Gene Expression database (9), the Ludwig Institute for Cancer Research’s Massively Parallel Signature Sequence (MPSS) database (10), and NCBI’s Unigene database (11). The Serial Analysis of Gene Expression database contains gene expression data on 22 major tissue types in both nondiseased and cancer tissues. The Massively Parallel Signature Sequence database reports gene expression in 32 tissues, with a much higher dynamic range of transcripts per cell than either of the other methods. We used the transcripts-per-million counts reported by the above methods to manually construct tissue specificity profiles on a gene-per-gene basis. From these profiles, we can categorize genes as specific or ubiquitous by use of a binary classification for expression/nonexpression in each tissue type or a numeric classification by percentage of total transcripts in each tissue type, normalized by the tissue-type library size.

Within the 643 gene products analyzed by the secreted protein prediction methods, several genes with well-characterized properties can be used as controls with our approach (Table 1 ). For these genes, we used tissue expression profiles to classify the relative prostate tissue specificity of the gene products. Included in the list of genes are 2 positive controls, KLK3 (kallikrein 3 (prostate specific antigen) and ACPP (acid phosphatase, prostate), 2 prostate biomarkers that are currently used in clinical testing. We predicted these 2 gene products to be secreted (confirming SwissProt annotation) and to possess strong expression in prostate tissue and minimal expression in other tissues. AMACR ({alpha} methylacyl-CoA racemase) (12), HPN [hepsin (transmembrane protease, serine 1)] (13), and ZWINT (ZW10 interactor) (14) are all associated with prostate cancer in the scientific literature but are known to not be secreted and were correctly identified as such by our methods. It should be noted that AMACR is used as a prostate tissue immunohistochemical marker; however, the lack of prostate specificity and intercellular localization of its gene product make this gene a poor serum biomarker. FN1 (fibronectin 1) and VEGF (vascular endothelial growth factor) are associated with prostate cancer and encode secreted proteins; however, these genes lack prostate-specific tissue specificity (15)(16)(17) (18). The lack of prostate tissue specificity in the expression of these 2 genes may be a major reason why they are not yet used clinically as prostate cancer serum biomarkers.


View this table:
[in this window]
[in a new window]
 
Table 1. Localization predictions and annotations of prostate cancer-associated proteins.

We have developed a bioinformatics protocol for screening candidate serum biomarker sets to identify high-quality markers for experimental evaluation. The in silico secreted protein pipeline provides a rapid screen for identifying biomarkers found extracellularly and is likely to be detectable by serum assays. Tissue specificity profiling compliments secreted protein prediction by identifying the originating tissue components of a biomarker’s serum signal and by allowing investigators to select candidate markers with a higher probability of having distinguishable signals. We hope that the use of intelligent bioinformatics analysis before costly experimental evaluation will accelerate the selection of candidate biomarkers that can be successfully translated into novel, clinically useful, assays.


Acknowledgments

This work was supported under a grant by Invitrogen.


References

  1. Klee EW, Ellis LB. Evaluating eukaryotic secreted protein prediction. BMC Bioinformatics 2005;6:256.[CrossRef][Medline] [Order article via Infotrieve]
  2. Bendtsen JD, Nielsen H, von Heijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J Mol Biol 2004;340:783-795.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  3. Emanuelsson O, Nielsen H, Brunak S, von Heijne G. Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 2000;300:1005-1016.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  4. Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 2001;305:567-580.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  5. Kall L, Krogh A, Sonnhammer EL. A combined transmembrane topology and signal peptide prediction method. J Mol Biol 2004;338:1027-1036.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  6. Finlay JA, Klee EW, McDonald C, Attewell JR, Hebrink D, Dyer R, et al. A systematic method for selection of promising serum protein biomarkers to improve prostate cancer (PCai) detection. Clin Chem2006. Page information to be filled inXXXXXXX.
  7. Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2005;33:D501-D504.[Abstract/Free Full Text]
  8. Bairoch A, Apweiler RT. HEQQSWISSQQQQ-Prot protein sequence database and its supplement TrEMBL in 2000. Nucl Acids Res 2000;28:45-48.[Abstract/Free Full Text]
  9. Boon K, Osorio EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, et al. An anatomy of normal and malignant gene expression. Proc Natl Acad Sci U S A 2002;99:11287-11292.[Abstract/Free Full Text]
  10. Jongeneel CV, Delorenzi M, Iseli C, Zhou D, Haudenschild CD, Khrebtukova I, et al. An atlas of human gene expression from massively parallel signature sequencing (MPSS). Genome Res 2005;15:1007-1014.[Abstract/Free Full Text]
  11. Pontius JU, Wagner L, Schuler GD. UniGene: a unified view of the transcriptome. The NCBI Handbook 2003 National Center for Biotechnology Information Bethesda (MD). .
  12. Troyer DA, Mubiru J, Leach RJ, Naylor SL. Promise and challenge: markers of prostate cancer detection, diagnosis, and prognosis. Dis Markers 2004;20:117-128.[Web of Science][Medline] [Order article via Infotrieve]
  13. Nelson PS. Predicting prostate cancer behavior using transcript profiles. J Urol 2004;172:S28–S32; discussion S33..
  14. LaTulippe E, Satagopan J, Smith A, Scher H, Scardino P, Reuter V, et al. Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. Cancer Res 2002;62:4499-4506.[Abstract/Free Full Text]
  15. Chakrabarty S, Fritsche HA. Fibronectin and prostate cancer. J Clin Ligand Assay 2002;25:64-69.
  16. Albrecht M, Renneberg H, Wennemuth G, Moschler O, Janssen M, Aumuller G, et al. Fibronectin in human prostatic cells in vivo and in vitro: expression, distribution, and pathological significance. Histochem Cell Biol 1999;112:51-61.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  17. Chakravarti A, Zhai GG. Molecular and genetic prognostic factors of prostate cancer. World J Urol 2003;21:265-274.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]
  18. Lin CC, Wu HC, Tsai FJ, Chen HY, Chen WC. Vascular endothelial growth factor gene-460 C/T polymorphism is a biomarker for prostate cancer. Urology 2003;62:374-377.[CrossRef][Web of Science][Medline] [Order article via Infotrieve]



The following articles in journals at HighWire Press have cited this article:


Home page
BioinformaticsHome page
G. Vasmatzis, E. W. Klee, D. M. Kube, T. M. Therneau, and F. Kosari
Quantitating tissue specificity of human genes to facilitate biomarker discovery
Bioinformatics, June 1, 2007; 23(11): 1348 - 1355.
[Abstract] [Full Text] [PDF]


Home page
Clin. Chem.Home page
J. A. Finlay, E. W. Klee, C. McDonald, J. R. Attewell, D. Hebrink, R. Dyer, B. Love, G. Vasmatzis, T. M. Li, J. M. Beechem, et al.
A systematic method for selection of promising serum protein biomarkers to improve prostate cancer (PCa1) detection.
Clin. Chem., November 1, 2006; 52(11): 2159 - 2162.
[Full Text] [PDF]


This Article
Right arrow Extract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (5)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Klee, E. W.
Right arrow Articles by Klee, G. G.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Klee, E. W.
Right arrow Articles by Klee, G. G.
Related Collections
Right arrow Cancer Diagnostics (since 2002)


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS