Emiliania huxleyi

Unless otherwise noted, all files are in FASTA format and compressed with Gnu Zip (gzip).To uncompress files: on Mac use StuffIt, on PC use WinZip or WinRAR.

NOTE: Masked regions are represented with lowercase characters; gaps in the assembly are represented with Ns.

Assembly:

Assembled scaffolds (unmasked): Emihu1_scaffolds.fasta.gz

Assembled scaffolds (masked): Emihu1_masked_scaffolds.fasta.gz

Dotplot of all scaffolds against all (generated using Vista): Emihu1_allScaffolds_selfAligned.pdf.gz

Unplaced genomic reads: Emihu1_unplaced_genomic_reads.fasta.gz

Annotation:

Putative diploid alleles: Emihu1_diploidAlleles.list.gz

Reduced (‘haploid’) Models” is the set of GeneCatalog models (as of April 4, 2008) reduced based on the diploid allele set. For each pair of diploid alleles, the model on the smaller scaffold was dropped:

    Proteins: Emihu1_reduced_proteins.fasta.gz

    Transcripts: Emihu1_reduced_transcripts.fasta.gz

    Genes: Emihu1_reduced_genes.gff.gz

Functional Annotation of the Reduced set:

        GO: Emihu1_GO.tab.gz

        KEGG: Emihu1_KEGG.tab.gz

        KOG: Emihu1_KOG.tab.gz

"Filtered ('best') Models" is the filtered set of models selected by the JGI Annotation Pipeline as the best gene model at each locus. Diploid alleles are included:

Proteins:Emihu1_best_proteins.fasta.gz

Transcripts: Emihu1_best_transcripts.fasta.gz

Genes: Emihu1_best_genes.gff.gz

"All Models" is the set of all models generated by the JGI Annotation Pipeline and than submitted for filtering. This may include redundant or conflicting models at any locus:

Proteins: Emihu1_all_proteins.fasta.gz

Transcripts: Emihu1_all_transcripts.fasta.gz

Genes: Emihu1_all_genes.gff.gz

ESTs:

ESTs: Emihu1_ESTs.fasta.gz

EST cluster consensi: Emihu1_EST_cluster_consensi.fasta.gz