TY - JOUR
T1 - Plant omics data center
T2 - An integrated web repository for interspecies gene expression networks with NLP-based curation
AU - Ohyanagi, Hajime
AU - Takano, Tomoyuki
AU - Terashima, Shin
AU - Kobayashi, Masaaki
AU - Kanno, Maasa
AU - Morimoto, Kyoko
AU - Kanegae, Hiromi
AU - Sasaki, Yohei
AU - Saito, Misa
AU - Asano, Satomi
AU - Ozaki, Soichi
AU - Kudo, Toru
AU - Yokoyama, Koji
AU - Aya, Koichiro
AU - Suwabe, Keita
AU - Suzuki, Go
AU - Aoki, Koh
AU - Kubo, Yasutaka
AU - Watanabe, Masao
AU - Matsuoka, Makoto
AU - Yano, Kentaro
N1 - Publisher Copyright:
© 2014 The Author.
PY - 2015/1/1
Y1 - 2015/1/1
N2 - Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources.
AB - Comprehensive integration of large-scale omics resources such as genomes, transcriptomes and metabolomes will provide deeper insights into broader aspects of molecular biology. For better understanding of plant biology, we aim to construct a next-generation sequencing (NGS)-derived gene expression network (GEN) repository for a broad range of plant species. So far we have incorporated information about 745 high-quality mRNA sequencing (mRNA-Seq) samples from eight plant species (Arabidopsis thaliana, Oryza sativa, Solanum lycopersicum, Sorghum bicolor, Vitis vinifera, Solanum tuberosum, Medicago truncatula and Glycine max) from the public short read archive, digitally profiled the entire set of gene expression profiles, and drawn GENs by using correspondence analysis (CA) to take advantage of gene expression similarities. In order to understand the evolutionary significance of the GENs from multiple species, they were linked according to the orthology of each node (gene) among species. In addition to other gene expression information, functional annotation of the genes will facilitate biological comprehension. Currently we are improving the given gene annotations with natural language processing (NLP) techniques and manual curation. Here we introduce the current status of our analyses and the web database, PODC (Plant Omics Data Center; http://bioinf.mind.meiji.ac.jp/podc/), now open to the public, providing GENs, functional annotations and additional comprehensive omics resources.
KW - Correspondence analysis
KW - Database
KW - Gene expression network
KW - Manual curation
KW - Natural language processing (NLP)
KW - Omics
UR - http://www.scopus.com/inward/record.url?scp=84922659828&partnerID=8YFLogxK
U2 - 10.1093/pcp/pcu188
DO - 10.1093/pcp/pcu188
M3 - Article
C2 - 25505034
AN - SCOPUS:84922659828
SN - 0032-0781
VL - 56
SP - e9
JO - Plant and Cell Physiology
JF - Plant and Cell Physiology
IS - 1
ER -