Metagenomics Program at JGI, Home

Metagenome Binning

ClaMS – compositional classifier for metagenomic sequences

Motivation

Binning - classification of metagenomic contigs according to their likely taxonomic origin - is an important step in metagenome analysis enabling analysis of genetic content of individual populations. Supervised binning based on oligonucleotide composition of contigs is not biased by the composition of the reference database of genome sequences and has higher accuracy than unsupervised binning. The goal is to develop a tool for supervised compositional binning of metagenome sequences.

Results

ClaMS ("Classifier for Metagenomic Sequences") has been developed. It's a Java application for binning metagenomic contigs using user-specified training sets and initial parameters. 2 signatures of oligonucleotide composition (de Bruijn chain – DBC and oligonucleotide odds ratio – DOR) and 3 word length (dinucleotides to tetranucleotides) can be used. Pre-computed signatures for all finished genomes are included with ClaMS distribution. Training sets can be defined either by selecting a node in phylogenetic tree in the ClaMS-GUI or by uploading their own fasta files of sequences. For each sequence to be binned its signature is computed, and compared to the centroid signatures of all training sets; the best match is declared the bin of this sequence, if the distance between the signature and centroid is less than user-selected cutoff. ClaMS can bin ~20,000 sequences in 3 min on a laptop with 2.4 GHx Intel Core 2 Duo processor and 2 GB RAM, and can be run under any operating system on which Java Runtime Environment can be installed.

The ClaMS software is freely available from: http://clams.jgi-psf.org. A paper describing ClaMS has been submitted for publication.