Announcements
- January 23-24, 2012
CSP 2012 PI Workshop, Walnut Creek, CA - January 19-20, 2012
Microbial Genome Reannotation Workshop, Walnut Creek, CA
Releases
- January 31, 2012
Clostridium sp. BNL1100 - December 1, 2011
Acidovorax avenae avenae ATCC 19860 - December 1, 2011
Alicycliphilus denitrificans K601 - December 1, 2011
Cellulosilyticum lentocellum RHM5, DSM 5427 - December 1, 2011
Delftia sp. Cs1-4
Draft vs Finished
Microbial genome draft vs. finished assembly evaluation
Motivation.
This project aims to evaluate and compare the genome annotation results obtained with different sequencing library types, sequence platforms, and sequence amount.
Results.
134 microbial genomes were assembled at the Quality Draft (QD) stage using 11 different combinations of sequence technologies. Out of the 11 combinations 6 were applied in more than 10 genomes each and were selected for further analysis, as shown below.
Number of genomes sequenced by combination of technologies. Only combinations with more than 10 projects were included in this analysis (red color)
Sequencing technology platforms seem to affect the quality of the QD assembly. We observed a correlation between the number of QD contigs (a metric for the quality of the assembly) and the technologies used. Combinations of technologies in general perform better, while Illumina only based sequencing performs well when the long insert libraries are used. However, our data indicate that the quality of the QD assembly (as measured by the number of contigs) does not depend on features of the organism such as the size, the GC percentage but it does depend on the number of repeats. While older sequencing technologies were biased against single copy genes (Sanger technology) the newest sequencing technologies appear not to have such biases although they are affected to a greater extend by the presence of repeats. The new technologies (Illumina) allow the sequencing of practically the full genome, with minimal if any loss of sequence, unfortunately the current assembly algorithms provide large number of contigs, which results to lower quality of annotation than expected (i.e. more missed genes due to fragmentation of sequence).