Announcements
- July 15-18, 2012
Annual meeting of the Mycological Society of America, New Haven, CT - June 17-22, 2012
Gordon Conference on Cellular & Molecular Fungal Biology, Holderness, NH - May 14-18, 2012
MGM workshop at JGI, Walnut Creek, CA - Apr 2-4 2012, 2012
Biocuration 2012, Washington, DC
Releases
- May 4, 2012
Piloderma croceum F 1598 v1.0 - April 30, 2012
Hyphopichia burtonii NRRL Y-1933 v1.0 - April 30, 2012
Metschnikowia bicuspidata NRRL YB-4993 v1.0 - April 19, 2012
Mixia osmundae IAM 14324 v1.0 - April 19, 2012
Conidiobolus coronatus NRRL28638 v1.0
Comparative Analysis Methods and Tools
Motivation
Genome annotation and analysis requires development and validation of new algorithms and tools. Several directions of this development include methods to analyze eukaryotic genome organization (tandem and segmental duplication, gene-based synteny, including for multiple related genomes), gene structure (intron conservation or loss across genomes), gene gain/loss (detection of possible errors in automated clustering results for analysis of gene families, creating whole genome based phylogenetic trees based on clustering results, pfam domain analysis to detect expanded and lost families), genome evolution, gene expression, genome variation, metabolic pathways and regulatory elements. Test new gene predictors, including those using Rna-Seq data and synteny-based approaches on validated gene sets in terms of accuracy and speed, pipelines (eg, MAKER), repeat finding software, and non-coding RNA finding software. This project aims at (1) developing algorithms and prototypes for new genome analysis methods for publications; (2) testing new gene prediction and genome analysis tools for possible integration into production annotation process.
Comparative Gene Modeling.
Comparative gene modeling aimed to improve the initial gene predictions for a set of closely related organisms and correct for missing or incorrectly predicted genes (incorrect splice sites, chimeras, gene fragments, etc).The idea of comparative modeling is that for closely related genomes, most orthologs have the same conserved gene structure. The algorithm maps all gene models predicted in all genomes to all individual genomes, and for each locus selects among the potentially many competing models, the one which is most closely resemble the homologous genes from other genomes. This procedure maybe iterated several times until no change in gene models will be observed
Results
For Basidiomycete Dichomitus squalens reannotation using comparative modeling is compared with initial JGI production annotation:
| JGI Annotation pipeline | Comparative modeling | |
|---|---|---|
| Number of predicted gene models | 12,290 | 12,802 |
| with Swissprot hits | 7,356 | 7,900 |
| with non-repeat PFAM domains | 6,010 | 6,353 |
| with EST support | 10,796 | 11,105 |
| with >90% EST support | 9,178 | 9,444 |
| Number of unique PFAM domains | 2,245 | 2,322 |
| Average EST coverage per gene | 93.3% | 93.3% |
| Splice sites supported by ESTs | 102,200 | 104,246 |