Reference Genome Analysis
A set of analyses were run on 178 annotated microbial reference genomes, as described in A catalog of reference genomes from the human microbiome. Here we present figures & downloadable datasets resulting from these analyses. Where possible, analyses will be rerun periodically as additional reference genomes are submitted to NCBI with annotation, and updated datasets will be added.
Novel Gene Survey
Annotated polypeptides from the set of 178 reference genomes were searched against the bacterial and viral divisions of NCBI's nonredundant (nr) protein database, and compared to a merged database of TIGRfam and Pfam Hidden Markov Models (see A catalog of reference genomes from the human microbiome for more details). This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (~97%) were unique. Multifasta files corresponding to these two datasets can be downloaded here:
