TY - JOUR
T1 - TriAnnot
T2 - A versatile and high performance pipeline for the automated annotation of plant genomes
AU - Leroy, Philippe
AU - Guilhot, Nicolas
AU - Sakai, Hiroaki
AU - Bernard, Aurélien
AU - Choulet, Frédéric
AU - Theil, Sébastien
AU - Reboux, Sébastien
AU - Amano, Naoki
AU - Flutre, Timothée
AU - Pelegrin, Céline
AU - Ohyanagi, Hajime
AU - Seidel, Michael
AU - Giacomoni, Franck
AU - Reichstadt, Mathieu
AU - Alaux, Michael
AU - Gicquello, Emmanuelle
AU - Legeai, Fabrice
AU - Cerutti, Lorenzo
AU - Numa, Hisataka
AU - Tanaka, Tsuyoshi
AU - Mayer, Klaus
AU - Itoh, Takeshi
AU - Quesneville, Hadi
AU - Feuillet, Catherine
PY - 2012/1/31
Y1 - 2012/1/31
N2 - In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, Tri- Annot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes,TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.
AB - In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, Tri- Annot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes,TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.
KW - Cluster
KW - Gene models
KW - Pipeline
KW - Plant genome
KW - Structural and functional annotation
KW - Transposable elements
KW - Wheat
UR - http://www.scopus.com/inward/record.url?scp=84863829703&partnerID=8YFLogxK
U2 - 10.3389/fpls.2012.00005
DO - 10.3389/fpls.2012.00005
M3 - Article
AN - SCOPUS:84863829703
SN - 1664-462X
VL - 3
JO - FRONTIERS IN PLANT SCIENCE
JF - FRONTIERS IN PLANT SCIENCE
IS - JAN
M1 - 5
ER -