Species-independent MicroRNA Gene Discovery

  • Timothy K. Kamanu

Student thesis: Doctoral Thesis

Abstract

MicroRNA (miRNA) are a class of small endogenous non-coding RNA that are mainly negative transcriptional and post-transcriptional regulators in both plants and animals. Recent studies have shown that miRNA are involved in different types of cancer and other incurable diseases such as autism and Alzheimer’s. Functional miRNAs are excised from hairpin-like sequences that are known as miRNA genes. There are about 21,000 known miRNA genes, most of which have been determined using experimental methods. miRNA genes are classified into different groups (miRNA families). This study reports about 19,000 unknown miRNA genes in nine species whereby approximately 15,300 predictions were computationally validated to contain at least one experimentally verified functional miRNA product. The predictions are based on a novel computational strategy which relies on miRNA family groupings and exploits the physics and geometry of miRNA genes to unveil the hidden palindromic signals and symmetries in miRNA gene sequences. Unlike conventional computational miRNA gene discovery methods, the algorithm developed here is species-independent: it allows prediction at higher accuracy and resolution from arbitrary RNA/DNA sequences in any species and thus enables examination of repeat-prone genomic regions which are thought to be non-informative or ’junk’ sequences. The information non-redundancy of uni-directional RNA sequences compared to information redundancy of bi-directional DNA is demonstrated, a fact that is overlooked by most pattern discovery algorithms. A novel method for computing upstream and downstream miRNA gene boundaries based on mathematical/statistical functions is suggested, as well as cutoffs for annotation of miRNA genes in different miRNA families. Another tool is proposed to allow hypotheses generation and visualization of data matrices, intra- and inter-species chromosomal distribution of miRNA genes or miRNA families. Our results indicate that: miRNA and miRNA genes are not only species-specific but may also be DNA strand-specific and chromosome-specific; the genomic distribution of miRNA genes is conserved at the chromosomal level across species; miRNA are conserved; More than one miRNA with different regulatory targets can be excised from one miRNA gene; Repeat-related miRNA and miRNA genes with palindromic sequences may be the largest subclass of miRNA class that have eluded detection by most computational and experimental methods.
Date of AwardDec 2012
Original languageEnglish (US)
Awarding Institution
  • Biological, Environmental Sciences and Engineering
SupervisorVladimir Bajic (Supervisor)

Keywords

  • microRNA
  • microRNA families
  • pattern matching
  • kernel methods
  • genetic algorithm
  • visualisation of multiple categories

Cite this

'