In the first phase of WGS sequencing, 764 samples were sequenced, comprising 16 body sites. Of these, 690 passed our quality control screens, which included identification of outliers by mean contig & ORF density, human hits, rRNA hits and size. These 690 assemblies underwent gene prediction and annotation to generate the HMP Gene Index.
Gene sequence and annotation is provided here in gff3 format, nucleotide and protein multi-FASTA formats with one set of files per sample. Functional attributes assigned to each gene prediction include gene name, gene symbol, Gene Ontology (GO) assignments and Enzyme Commission (EC) numbers.
Annotations per sample were parsed and counts generated for each annotation attribute. Samples were then categorized by body site to provide a summary of annotation attributes by site.
Protocols and Tools