Integration of text- and data-mining using ontologies successfully selects disease gene candidates

Nicki Tiffin*, Janet F. Kelso, Alan R. Powell, Hong Pan, Vladimir B. Bajic, Winston A. Hide

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

168 Scopus citations

Abstract

Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (±18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.

Original languageEnglish (US)
Pages (from-to)1544-1552
Number of pages9
JournalNUCLEIC ACIDS RESEARCH
Volume33
Issue number5
DOIs
StatePublished - 2005
Externally publishedYes

Bibliographical note

Funding Information:
Richard D. Williams, Paediatric Oncology, Institute of Cancer Research, United Kingdom, for generating a curated list of tissues implicated in Wilms’ tumour. Cathal Seoighe, Computational Biology Group, University of Cape Town, South Africa, for methodology advice. Charles Auffray, CNRS, Villejuif, France and Ranajit Chakraborty, Centre for Genomic Information, University of Cincinnati, USA. for critical review of the manuscript. This work was funded by the Medical Research Council South Africa, the National Bioinformatics Network South Africa and the Wellcome Trust grant number CRIG,HH7MD. Funding to pay the Open Access publication charges for this article was provided by the Medical Research Council South Africa.

ASJC Scopus subject areas

  • Genetics

Fingerprint

Dive into the research topics of 'Integration of text- and data-mining using ontologies successfully selects disease gene candidates'. Together they form a unique fingerprint.

Cite this