当前位置:首页 > 科研成果 > 本所论文
论文题目: Comparison of different sequencing and assembly strategies for a repeat-rich fungal genome, Ophiocordyceps sinensis
作者: Li Yi, Hsiang Tom, Yang Rui-Heng, Hu Xiao-Di, Wang Ke, Wang Wen-Jing, Wang Xiao-Liang, Jiao Lei, and Yao Yi-Jian*.
联系作者:
刊物名称: J Microbiol Methods
期:
卷: 128
页: 1-6
年份: 2016
影响因子: 2.247
论文下载: 下载地址
摘要: Ophiocordyceps sinensis is one of the most expensive medicinal fungi world-wide, and has been used as a traditional Chinese medicine for centuries. In a recent report, the genome of this fungus was found to be expanded by extensive repetitive elements after assembly of Roche 454 (223Mb) and Illumina HiSeq (10.6Gb) sequencing data, producing a genome of 87.7Mb with an N50 scaffold length of 12kb and 6972 predicted genes. To test whether the assembly could be improved by deeper sequencing and to assess the amount of data needed for optimal assembly, genomic sequencing was run several times on genomic DNA extractions of a single ascospore isolate (strain 1229) on an Illumina HiSeq platform (25Gb total data). Assemblies were produced using different data types (raw vs. trimmed) and data amounts, and using three freely available assembly programs (ABySS, SOAP and Velvet). In nearly all cases, trimming the data for low quality base calls did not provide assemblies with higher N50 values compared to the non-trimmed data, and increasing the amount of input data (i.e. sequence reads) did not always lead to higher N50 values. Depending on the assembly program and data type, the maximal N50 was reached with between 50% to 90% of the total read data, equivalent to 100x to 200x coverage. The draft genome assembly was improved over the previously published version resulting in a 114Mb assembly, scaffold N50 of 70kb and 9610 predicted genes. Among the predicted genes, 9213 were validated by RNA-Seq analysis in this study, of which 8896 were found to be singletons. Evidence from genome and transcriptome analyses indicated that species assemblies could be improved with defined input material (e.g. haploid mono-ascospore isolate) without the requirement of multiple sequencing technologies, multiple library sizes or data trimming for low quality base calls, and with genome coverages between 100x and 200x.