Microbial Program at JGI-Home

Draft vs Finished

Microbial genome draft vs. finished assembly evaluation

Motivation.

This project aims to evaluate and compare the genome annotation results obtained with different sequencing library types, sequence platforms, and sequence amount.

Results.

134 microbial genomes were assembled at the Quality Draft (QD) stage using 11 different combinations of sequence technologies. Out of the 11 combinations 6 were applied in more than 10 genomes each and were selected for further analysis, as shown below.

chart that shows comparisson of different sequencing machines Number of genomes sequenced by combination of technologies. Only combinations with more than 10 projects were included in this analysis (red color)

Sequencing technology platforms seem to affect the quality of the QD assembly. We observed a correlation between the number of QD contigs (a metric for the quality of the assembly) and the technologies used. Combinations of technologies in general perform better, while Illumina only based sequencing performs well when the long insert libraries are used. However, our data indicate that the quality of the QD assembly (as measured by the number of contigs) does not depend on features of the organism such as the size, the GC percentage but it does depend on the number of repeats. While older sequencing technologies were biased against single copy genes (Sanger technology) the newest sequencing technologies appear not to have such biases although they are affected to a greater extend by the presence of repeats. The new technologies (Illumina) allow the sequencing of practically the full genome, with minimal if any loss of sequence, unfortunately the current assembly algorithms provide large number of contigs, which results to lower quality of annotation than expected (i.e. more missed genes due to fragmentation of sequence).