Homo sapiens

Rosalind Franklin's X-ray diffraction photograph of DNA, 1953. Photo: courtesy HarperCollins
Chromosome 5 is one of the largest human chromosomes yet has one of the lowest gene densities. This is partially explained by numerous gene-poor regions that display a remarkable degree of noncoding and syntenic conservation with non-mammalian vertebrates, suggesting they are functionally constrained. In total, we compiled 177.70 million base pairs of highly accurate finished sequence containing 923 manually curated protein-encoding genes including the protocadherin and interleukin gene families and the first complete versions of each of the large chromosome 5 specific internal duplications. These duplications are very recent evolutionary events and play a likely mechanistic role, since deletions of these regions are the cause of debilitating disorders including spinal muscular atrophy (SMA).

The US Department of Energy’s interest in chromosome 5 emerged from a series of pilot studies begun at the Lawrence Berkeley National Laboratory focusing on a cluster of interleukin genes located at human 5q31. These studies of a megabase of chromosome 5 illustrated how finished human sequence could contribute to gene annotation and how multi-mammalian sequence comparisons could lead to the sequence based identification of noncoding elements possessing gene regulatory activities. The finished sequence of chromosome 5, and its analysis alone and in comparison to orthologous regions in other vertebrate genomes now provides a chromosome-wide catalog of genes and evolutionarily conserved noncoding sequences. Many of these insights, as well as clues into disease causing deletions arising from the segmented duplication landscape of chromosome 5, can only now be appreciated with the finished sequence of this chromosome in hand.

Publication: The DNA sequence and comparative analysis of human chromosome 5. Nature 431, 268-274 (2004)

General project/clone information: JGI Human Genome Project page