The HMP plans to sequence, or collect from associated efforts, a total of 3000 reference genomes isolated from human body sites. The majority of these will be sequenced only to a high-quality draft stage. Metadata about current, completed and targeted reference genome projects can be found in the Project Catalog. The information gained from the Reference Genomes will aid in taxonomic assignment and functional annotation of 16S RNA and metagenomic sequence, respectively, from microbiome samples.
As reference genomes are released with annotation, they will become available for download here. This page does not reflect every project found in the HMP Project Catalog, but only those that have completed sequencing and annotation. Isolates are organized by body site, then by genus and species name. Users can sort within body site by Genbank project id. Hover over download icons to see file format type and file size. The DCC provides the following four file formats: assembly nucleotide fasta (ASM), protein multifasta (PEP), coding sequence nucleotide multifasta (CDS), and genbank format (GBK). Sequence and annotation data is also available at NCBI.
This page is updated monthly. Contact the DCC if you need access to a previous version of the dataset.
Protocols and Tools