TY - JOUR
T1 - Data Descriptor: Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data
AU - Zhang, Jianwei
AU - Chen, Ling Ling
AU - Sun, Shuai
AU - Kudrna, Dave
AU - Copetti, Dario
AU - Li, Weiming
AU - Mu, Ting
AU - Jiao, Wen Biao
AU - Xing, Feng
AU - Lee, Seunghee
AU - Talag, Jayson
AU - Song, Jia Ming
AU - Du, Bogu
AU - Xie, Weibo
AU - Luo, Meizhong
AU - Maldonado, Carlos Ernesto
AU - Goicoechea, Jose Luis
AU - Xiong, Lizhong
AU - Wu, Changyin
AU - Xing, Yongzhong
AU - Zhou, Dao Xiu
AU - Yu, Sibin
AU - Zhao, Yu
AU - Wang, Gongwei
AU - Yu, Yeisoo
AU - Luo, Yijie
AU - Hurtado, Beatriz Elena Padilla
AU - Danowitz, Ann
AU - Wing, Rod A.
AU - Zhang, Qifa
N1 - Generated from Scopus record by KAUST IRTS on 2019-11-20
PY - 2016/9/13
Y1 - 2016/9/13
N2 - Over the past 30 years, we have performed many fundamental studies on two Oryza sativa subsp. indica varieties, Zhenshan 97 (ZS97) and Minghui 63 (MH63). To improve the resolution of many of these investigations, we generated two reference-quality reference genome assemblies using the most advanced sequencing technologies. Using PacBio SMRT technology, we produced over 108 (ZS97) and 174 (MH63) Gb of raw sequence data from 166 (ZS97) and 209 (MH63) pools of BAC clones, and generated ∼97 (ZS97) and ∼74 (MH63) Gb of paired-end whole-genome shotgun (WGS) sequence data with Illumina sequencing technology. With these data, we successfully assembled two platinum standard reference genomes that have been publicly released. Here we provide the full sets of raw data used to generate these two reference genome assemblies. These data sets can be used to test new programs for better genome assembly and annotation, aid in the discovery of new insights into genome structure, function, and evolution, and help to provide essential support to biological research in general.
AB - Over the past 30 years, we have performed many fundamental studies on two Oryza sativa subsp. indica varieties, Zhenshan 97 (ZS97) and Minghui 63 (MH63). To improve the resolution of many of these investigations, we generated two reference-quality reference genome assemblies using the most advanced sequencing technologies. Using PacBio SMRT technology, we produced over 108 (ZS97) and 174 (MH63) Gb of raw sequence data from 166 (ZS97) and 209 (MH63) pools of BAC clones, and generated ∼97 (ZS97) and ∼74 (MH63) Gb of paired-end whole-genome shotgun (WGS) sequence data with Illumina sequencing technology. With these data, we successfully assembled two platinum standard reference genomes that have been publicly released. Here we provide the full sets of raw data used to generate these two reference genome assemblies. These data sets can be used to test new programs for better genome assembly and annotation, aid in the discovery of new insights into genome structure, function, and evolution, and help to provide essential support to biological research in general.
UR - http://www.nature.com/articles/sdata201676
UR - http://www.scopus.com/inward/record.url?scp=84987925169&partnerID=8YFLogxK
U2 - 10.1038/sdata.2016.76
DO - 10.1038/sdata.2016.76
M3 - Article
SN - 2052-4463
VL - 3
JO - Scientific data
JF - Scientific data
ER -