Announcements
- January 23-24, 2012
CSP 2012 PI Workshop, Walnut Creek, CA
Releases
- October 13, 2011
Freshwater propionate Anammox bacterial community from bioreactor in Nijmegen, The Netherlands, Freshwater propionate enrichment of Brocadia fulgida (Brocadia contigs) - September 30, 2011
Marine Bacterioplankton communities from Antarctic, sample from Winter (Winter fosmids Sept 2010 assemblies) - September 30, 2011
Freshwater microbial communities from Antarctic Deep Lake, sample 13m 0.1um (13m 0.1um 454 only) - September 30, 2011
Hot spring microbial community from Beowulf Spring, Yellowstone National Park, sample YNP_Beowulf Spring_D (YNP_Beowulf Spring_D) - September 30, 2011
Soil microbial communities from FACE and OTC sites, Soil microbial communities from sample at FACE Site NTS_007 Nevada Test Site (NTS_007)
Metagenome Binning
ClaMS – compositional classifier for metagenomic sequences
Motivation
Binning - classification of metagenomic contigs according to their likely taxonomic origin - is an important step in metagenome analysis enabling analysis of genetic content of individual populations. Supervised binning based on oligonucleotide composition of contigs is not biased by the composition of the reference database of genome sequences and has higher accuracy than unsupervised binning. The goal is to develop a tool for supervised compositional binning of metagenome sequences.
Results
ClaMS ("Classifier for Metagenomic Sequences") has been developed. It's a Java application for binning metagenomic contigs using user-specified training sets and initial parameters. 2 signatures of oligonucleotide composition (de Bruijn chain – DBC and oligonucleotide odds ratio – DOR) and 3 word length (dinucleotides to tetranucleotides) can be used. Pre-computed signatures for all finished genomes are included with ClaMS distribution. Training sets can be defined either by selecting a node in phylogenetic tree in the ClaMS-GUI or by uploading their own fasta files of sequences. For each sequence to be binned its signature is computed, and compared to the centroid signatures of all training sets; the best match is declared the bin of this sequence, if the distance between the signature and centroid is less than user-selected cutoff. ClaMS can bin ~20,000 sequences in 3 min on a laptop with 2.4 GHx Intel Core 2 Duo processor and 2 GB RAM, and can be run under any operating system on which Java Runtime Environment can be installed.
The ClaMS software is freely available from: http://clams.jgi-psf.org. A paper describing ClaMS has been submitted for publication.