De novo assembly of the Tamarindus indica genome as part of the Kingdom of Saudi Arabia Native Genome Project

Dataset

Description

The Kingdom of Saudi Arabia Native Genome project aims to generate genomic resources for all the plants, animals, and associated microbiome species in the Kingdom. Tamarindus indica was pointed out by the MEWA as an endangered native species in the KSA and forms part of the first 15 plant species to be studied in the NGP. A voucher tree was identified in the Rijal Almaa region, from which leaf samples were collected. HMW DNA was extracted from this tissue and sequenced using CCS with the Pac-Bio Sequel II platform. The raw data obtained from the sequencing was assembled using HIFIASM, contaminant contigs were removed, and the 15 largest contigs were selected as the primary T. indica assembly. The genome sequence of Sindora glabra was used as reference guide for primary scaffolding, and T. indica optical maps were used for super-scaffolding. Secondary scaffolding utilized Hi-C data to produce a chromosome level assembly of the T. indica genome. Transposable element analysis and a preliminary annotation were performed on the final assembly. This project represents the first step in studying T. indica for the NGP. The final assembly can be used as a foundation for more genetic studies on this species, as a possible reference for other legume species from the Detarioideae family, and for Neo-domestication and reforestation. The pipeline developed for this project can also be used as a template for sequencing and assembling the remaining species in the NGP.
Date made available2022
PublisherKAUST Research Repository

Cite this