Login






Tools & Technology

In addition to data generation, the HMP is invested in development of new tools & technologies for computational analysis. Here we provide information on funded technology development grants, and access to tools utilized by members of the HMP consortium. More information can be found in the menus above and on the NIH Common Fund Site.

Tools

All software, online resources and standard operating protocols used in, or developed as part of the HMP, will be accessible here as they become available.

If you have a protocol or software package that you would like to post on this site, or would like more information on the currently available content, please contact us via the feedback form.

Microbial Reference Genomes

Downloadable Tools
Core Gene Evaluation ScriptScreening for core gene sets as an indicator of completeness of draft genomes. This download includes a Perl script and required archaeal and bacterial core genes fasta and cluster files.
Online Resources
IMG System
A community resource for comparative analysis and annotation of publicly available genomes in a uniquely integrated context
Pathogen Portal
A set of web-based resources provided by the Bioinformatics Resource Centers (BRCs), focusing on organisms considered potential agents of biowarfare or bioterrorism or causing emerging or re-emerging diseases
RAST Annotation Server
A fully-automated service for annotating bacterial and archaeal genomes, leveraging data and procedures established within the SEED framework to provide high quality gene calling and functional annotation

Sampling, Sequencing, & Analyses of 16S RNA

Downloadable Tools
DNAclust
DNAclust is a fast clustering algorithm specifically designed for high-stringency clustering of DNA sequences, e.g. for 16S rRNA analyses or removal of duplicates/near duplicates in high-throughput shotgun datasets.
GINKGO
A GUI software package designed for non-statisticians to perform multivariate analysis
InVUE
A toolkit for rapid development of custom software packages for visualization and analysis of large datasets
LEfSe
LDA Effect Size is an algorithm for high-dimensional biomarker discovery and explanation that identifies metagenomic features (genes, pathways, or taxa) characterizing the differences between two or more biological conditions. In can be applied to taxonomic or functional abundance tables derived from metagenomic (WGS) data or 16S OTU/phylotype data.
Metastats
Metastats is a statistical package for comparing metagenomicdata-sets. Metastats was specifically designed for comparing clinical data comprising two treatment populations (e.g. sick vs. healthy) each comprising multiple samples, however thesoftware will also work for small number of samples. Metastats identifies features of the samples that "explain" the difference between the treatment populations. The features can be OTUs (e.g. inferred from 16S data), taxonomic groups, or other groupings (genes, functional groups, etc.) for which count data are available. Metastats primarily relies on a non-parametric t-test and reverts to Fisher's exact test for sparse features. Additional tests (presence/absence, odds ratios, etc.) are currently being implemented. Metastats is available as a web service, as standalone R and C code, as well as part of the Mothur package.
MicrobiomeUtilities A set of software utilities for processing and analyzing 16S rRNA genes including generating NAST alignments, chimera checking, and assembling paired 16S rRNA reads according to reference sequence homology
Mothur
A platform-independent software package for describing and comparing microbial communities. Mothur incorporates the functionality of a number of computational tools, calculators & visualization tools into a single program
Qiime
'Quantitative Insight Into Microbial Ecology'. Qiime allows a range of community analyses suitable for microbiome data using traditional and high-throughput sequencing methods
R-package: Hypothesis Testing and Power Calculations for Comparing Metagenomic Samples from HMP
This R-package provides several functions to perform formal hypothesis testing on the species abundance distribution of human microbiome data, and to calculate power and sample size requirements for human microbiome experiments.
R-package: Statistical Object Oriented Data Analysis of RDP-based Taxonomic trees from Human Microbiome Data: Modeling, Visualization, and Two-Group Comparison
This R-package introduces Object Oriented Data Analysis (OODA) methods to analyze Human Microbiome taxonomic trees directly, providing tools to model, compare, and visualize populations of taxonomic tree objects.
Simrank
A rapid and sensitive general-purpose k-mer search tool
speciateIT
A package for speciation of 16S sequences
Unifrac
A suite of tools for the comparison of microbial communities using phylogenetic information. It takes as input a single phylogenetic tree that contains sequences derived from at least two different environmental samples and a file describing which sequences came from which sample
Online Resources
Fast-Unifrac
Provides a suite of tools for the comparison of microbial communities using phylogenetic information
Greengenes
A 16S rRNA gene database and workbench compatible with ARB
RDP
Provides ribosome related data and services to the scientific community, including online data analysis and aligned and annotated Bacterial and Archaeal small-subunit 16S rRNA sequences
SitePainter
SitePainter allows users to visualize the different HMP body sites based on gradients of colors to represent available datasets

Sampling, Sequencing & Analysis of Whole Metagenomic Sequence

Downloadable Tools
BMTagger
NCBI's Best Match Tagger for removing human reads from metagenomics datasets. All HMP metagenomic sequence submitted to NCBI's Sequence Read Archive is being human filtered using BMTagger.
DeconSeq
Automatically detects and efficiently removes any type of sequence contamination from metagenomic datasets, including human or other host sequences. The tool uses a modified version of the BWA-SW aligner and can be applied to longer-read datasets (150+bp read length). DeconSeq is available as both standalone and web-based versions.
DNAclust
DNAclust is a fast clustering algorithm specifically designed for high-stringency clustering of DNA sequences, e.g. for 16S rRNA analyses or removal of duplicates/near duplicates in high-throughput shotgun datasets.
FragGeneScan
A short read gene finder
GINKGO
A GUI software package designed for non-statisticians to perform multivariate analysis
HUMAnN
The HMP Unified Metabolic Analysis Network (HUMAnN) is a pipeline for efficiently and accurately determining the presence/absence and abundance of microbial pathways in a community from metagenomic data (WGS). The pipeline converts sequence reads into coverage and abundance tables summarizing the gene families and pathways in one or more microbial communities.
InVUE
A toolkit for rapid development of custom software packages for visualization and analysis of large datasets
LEfSe
LDA Effect Size is an algorithm for high-dimensional biomarker discovery and explanation that identifies metagenomic features (genes, pathways, or taxa) characterizing the differences between two or more biological conditions. In can be applied to taxonomic or functional abundance tables derived from metagenomic (WGS) data or 16S OTU/phylotype data.
Metamos
MetAmos is a pipeline for metagenomic assembly. It includes a collection of utilities for performing the assembly and for analyzing assembly output.
MetaPhlAn
A computational tool for profiling the composition of microbial communities from metagenomic data (WGS). MetaPhlAn relies on unique clade-specific marker genes identified from reference genomes, allowing very fast computational times, unambiguous taxonomic assignments, and species-level resolution.
Metaphyler
Metaphyler is a software tool for inferring the taxonomic composition of a microbial community from whole-metagenome (WGS) sequencing data. Metaphyler relies on alignments to a curated database of housekeeping genes.
Metapath
Metapath is a statistical package for comparing metagenomic data-sets at the pathway level (using KEGG pathway information). Metapath relies on a graph-theoretic definition of statistical significance in order to identify pathway motifs that differ between samples from two treatment populations.
METAREP
An open source tool to help scientists to view, query, browse, and compare metagenomics annotation data derived from ORFs called on metagenomics reads or assemblies (also available as an Online Resource)
Metastats
Metastats is a statistical package for comparing metagenomicdata-sets. Metastats was specifically designed for comparing clinical data comprising two treatment populations (e.g. sick vs. healthy) each comprising multiple samples, however thesoftware will also work for small number of samples. Metastats identifies features of the samples that "explain" the difference between the treatment populations. The features can be OTUs (e.g. inferred from 16S data), taxonomic groups, or other groupings (genes, functional groups, etc.) for which count data are available. Metastats primarily relies on a non-parametric t-test and reverts to Fisher's exact test for sparse features. Additional tests (presence/absence, odds ratios, etc.) are currently being implemented. Metastats is available as a web service, as standalone R and C code, as well as part of the Mothur package.
PRINSEQ
A sequence processing tool that can be used to filter, reformat and trim genomic and metagenomic sequence data. It generates summary statistics of the input in graphical and tabular formats that can be used for quality control steps. PRINSEQ is available as both standalone and web-based versions.
Simrank
A rapid and sensitive general-purpose k-mer search tool
TagCleaner
Automatically detects and efficiently removes tag sequences (e.g. WTA or MID tags) from metagenomic datasets. TagCleaner is available as both standalone and web-based versions.
Online Resources
Biocyc
A collection of Pathway/Genome Databases (PGDBs). Each PGDB describes the genome and metabolic pathways of a single organism. The MetaCyc database was used for HMP metabolic reconstruction.
IMG/M
Provides tools for analyzing the functional capability of microbial communities based on their metagenome sequence, in the context of reference isolate genomes included from the Integrated Microbial Genomes (IMG) system
METAREP
A suite of web based tools to help scientists to view, query, browse, and compare metagenomics annotation data derived from ORFs called on metagenomics reads or assemblies (also available as a stand alone tool)
MG-RAST
A fully-automated service for annotating metagenome samples, providing annotation of sequence fragments, phylogenetic classification, metabolic reconstructions and comparison tools

Protocols

All software, online resources and standard operating protocols used in, or developed as part of the HMP, will be accessible here as they become available.

If you have a protocol or software package that you would like to post on this site, or would like more information on the currently available content, please contact us via the feedback form.

Microbial Reference Genomes

Reference Genomes Database
HMP single cell MDA 16S rRNA Sanger sequencing SOP
Strain selection guidelinesGuidelines for Reference Genome Strain selection
BEI contamination protocol

HMP Sequencing Center-specific Annotation Protocols

The initial set of 178 Bacterial Reference Genomes described in the 2010 publication, a Catalog of Reference Genomes from the Human Microbiome, were annotated using individual sequencing center methodologies:

Consensus Annotation Protocols

Subsequent Reference Genomes have been annotated using a consensus protocol for gene calling & functional annotation:
Provisional Reference Genome Assembly Metrics A set of quality control metrics run on every HMP Reference Genomes to ensure accuracy, completeness and continuity of draft and improved assemblies
Bacterial Core Gene Evaluation Protocol describing use of the Core Gene Evaluation Script to assess completeness of bacterial draft assemblies
Archaeal Core Gene Evaluation Protocol describing use of the Core Gene Evaluation Script to assess completeness of archaeal draft assemblies

Sampling, Sequencing, & Analyses of 16S RNA

Manual of Procedures (MOP) A reference document for current National Institutes of Health (NIH) policies and procedures as they apply to the Human Microbiome Project (HMP) Core Microbiome Sampling study

MOP Updates Please download the MOP Supplement PDF for updates to product information and links.

Study participant consent forms can be found on the Microbiome Analysis page, under the Sample Collection tab.
Core Microbiome Sampling Protocol
16S Data Flow for HMP Sequencing Centers Guidelines for the HMP sequencing Centers for submitting 16S rRNA gene data and metadata to the HMP DACC
HMP 16S 454 protocol
Human Sequence Removal
SFF and Library Metadata File Generation
16S rRNA mothur Curation Pipeline
QIIME Community Profiling SOP

Sampling, Sequencing & Analysis of Whole Metagenomic Sequence

Manual of Procedures (MOP) A reference document for current National Institutes of Health (NIH) policies and procedures as they apply to the Human Microbiome Project (HMP) Core Microbiome Sampling study

MOP Updates Please download the MOP Supplement PDF for updates to product information and links.

Study participant consent forms can be found on the Microbiome Analysis page, under the Sample Collection tab.
Core Microbiome Sampling Protocol
Human Sequence Removal
HMP WGS Read Processing
HMP Whole-Metagenome Assembly
Body Site Assembly
Metagenomics Annotation SOP
GO Slim Analysis
Functional Database SOP
HUMAnN SOP
HMP Hybrid Assembly

Other Analysis

Walkthroughs

Walkthroughs are step-by-step tutorials taking users through typical HMP analysis paths, complete with sample datasets, details steps, screenshots and example output. These are geared toward educating researchers, particularly those without extensive bioinformatics infrastructures or experience, on utilizing selected tools and resources to reproduce HMP analyses, using HMP-generated or personal data as input.

img

Initial HMP walkthroughs utilize CloVR, a desktop application integrating state-of-the-art genomic tools in a robust, user friendly, fully automated software package with optional support for cloud computing platforms. CloVR is distributed as a portable virtual machine launched on a desktop or laptop under VMware or Virtualbox.

If you have questions about current walkthroughs, or would like to suggest additional walkthroughs, provide feedback or participate in beta testing of future walkthroughs, please contact us via the feedback form.

I. HMP- DACC 16S CloVR walkthrough

HMP- DACC 16S CloVR walkthrough

CloVR-16S supports 16S ribosomal RNA sequence analysis to study microbial community compositions. It processes short and long sequence reads from Sanger as well Roche/454 sequencing, including sequence reads generated with the multiplex amplicon 454 pyrosequencing protocol with specifically tagged or barcoded 16S rRNA PCR primers. The CloVR-16S pipeline employs several well-known phylogenetic tools and protocols:

  • QIIME - a Python-based workflow package, allowing for sequence processing and phylogenetic analysis using different methods including the phylogenetic distance metric UniFrac, UCLUST,PyNAST and the RDP Bayesian classifier;
  • 2UCHIME - a tool for rapid identification of chimeric 16S sequence fragments;
  • Mothur - a C++-based software package for 16S analysis;
  • Metastats and custom R scripts used to generate additional statistical and graphical evaluations.

This walkthrough uses HMP 16S rRNA sequences representing communities extracted from 12 hard-palate and 12 attached-keratinized gingiva oral sites.

II. HMP DACC Metagenomics CloVR walkthrough

HMP DACC Metagenomics CloVR walkthrough

The CloVR-Metagenomics protocol supports the analysis of shotgun sequencing data from total metagenomic DNA sequencing projects. This pipeline utilizes a number of well-known tools for analysis of metagenomic data:

  • UCLUST first clusters redundant sequences that show 99% nucleotide identity and removes artificial 454 replicate reads.
  • Representative DNA sequences are searched against the NCBI COG database using BLASTX.
  • Representative DNA sequences are searched against the NCBI RefSeq database of finished prokaryotic genomes using BLASTN.
  • Metastats and CloVR-implemented R scripts are applied for additional statistical and graphical evaluations of the pipeline results.
  • CloVR-Metagenomics generates several output reports including taxonomic and functional abundance tables, statistical comparisons of feature abundances between user-defined populations, andheatmaps with unsupervised clusterings of all samples.

This walkthrough uses HMP wgs reads representing microbial communities extracted from the mid-vagina and vaginal introitus sites.

III. Human Contaminant Screening

Human Contaminant Screening

This pipeline uses the NCBI BMTagger (Best Match Tagger) tool to identify and remove human reads in metagenomic sequences. For this walkthrough, we use a mock dataset which consists of a 50:50 mix of human contaminant-screened reads from an HMP project and filtered human reads from a 1000 genomes project.

IV. Metagenomic Assembly

Metagenomic Assembly

This pipeline is used to generate a "Pretty Good Assembly" a reasonable attempt at reconstructing pieces of the organisms present in the community that are long enough to allow gene finding and other downstream analyses. This version of the pipeline uses SOAPdenovo v.1.04. The HMP Whole-Metagenome Assembly protocol provides a detailed description of the pipeline. For this walkthrough, we use a sample from the HMP Anterior Nares body site.

V. Alignment of Metagenomic Reads to Reference Genomes Using Bowtie

Alignment of Metagenomic Reads to Reference Genomes Using Bowtie

This walkthrough provides a simple example of how to set-up and run the Bowtie Aligner using the web-browser accessible CloVR dashboard, as well as analyze the resulting outputs. We shall align metagenomic WGS reads extracted from the Anterior Nares body site (sample SRS019215), to reference genome Staphylococcus aureus.

VI. HUMAnN (HMP Unified Metabolic Analysis Network)

HUMAnN (HMP Unified Metabolic Analysis Network)

The HUMAnN pipeline is used for efficiently and accurately determining the presence/absence and abundance of microbial pathways in a community from metagenomic data. Sequencing a metagenome typically produces millions of short DNA/RNA reads. HUMAnN takes these reads as inputs and produces gene and pathway summaries as outputs:

  • The abundance of each orthologous gene family in the community.
  • The presence/absence of each pathway in the community.
  • The abundance of each pathway in the community, i.e. how many copies of that pathway are present.

For this walkthrough, we use genes from the Anterior Nares body site (sample SRS019215).

VII. Digital Normalization of Metagenomic Reads

Digital Normalization of Metagenomic Reads

This pipeline uses the DigiNorm algorithm to normalize metagenomic reads, substantially reducing the size without any significant impact on the assemblies that will be generated.

This walkthrough, uses a sample dataset from the HMP Illumina WGS Reads - Sample SRS018671.

VIII. Gene Clustering

Gene Clustering

This pipeline takes gene predictions from metagenomic shotgun sequence data (assemblies or reads), and generates a non-redundant gene set, using USEARCH (Edgar, 2010).

Funded Tools & Technology Research

The HMP roadmap initiative calls for the development of new tools & technologies, informatics capabilities and resources needed for the advancement of the field of metagenomics. The data sets produced by metagenomic sequencing and related components will be very large and complex, requiring novel analytical tools for distilling useful information from vast amounts of sequence data, functional genomic data and subject metadata.

As well, whole genome sequencing technologies are currently limited to the relatively small class of microbes that can be cultured. In order to maximize the number of sequences available in the reference set, new techniques must be developed to culture or otherwise isolate for analysis currently unculturable organisms. In the long-term, methods for sequencing individual microbes or otherwise analyzing all of the members of complex populations will substantively advance this field.

HMP funded projects are presented here. More information can be found by clicking on each project. As these technologies are further described and new tools become public, we will make information available on the DACC. Additional details are available on the NIH Common Fund Site.

New Technologies

Project Title Principal Investigator(s) Institution(s)
Species-by-Species Dissection of Microbiomes using Phage Display and Flow Sorting Cliff Han, Andrew Bradbury Los Alamos National Laboratory
Targeted genomic characterization of uncultured bacteria from the human microbiot Mircea Podar UT-Battelle, LLC - Oak Ridge National Laboratory
FISH 'N' Chips: A Microfluidic Processor for Isolating and Analyzing Microbes Anup K Singh Sandia Corp-Sandia National Laboratories
Functional Sorting of Microbial Cells From Complex Microbiota Mitchel Doktycz UT-Battelle, LLC - Oak Ridge National Laboratory
Multi-Dimensional Separation of Bacteria G. Scott Worthen Children's Hospital of Philadelphia
An Integrated lab-on-chip system for genome sequencing of single microbial cells Yu-Hwa Lo, Kun Zhang University Of California San Diego
SCODA DNA extraction to normalize species representation Andre Marziali Boreal Genomics Inc.
Optimization of a microfluidic device for single bacterial cell genomics David A Relman Stanford University
Cultivation and Characterization of Microaerobes from the Human Microbiome Vincent B Young, Thomas Mitchell Schmidt Michigan State University
Technologies for the discovery of novel human colonic mucosal-associated microbes Eugene B Chang University Of Chicago
Novel cultivation methods for the domestication of vaginal bacteria David Fredricks Fred Hutchinson Cancer Research Center
Confining single cells to enhance and target cultivation of human microbiome Rustem Ismagilov University Of Chicago
Culturing uncultivatable gut microorganisms Kim Lewis Northeastern University
FACS-MABE: a method to sort and enrich the as-yet uncultured bacterial species from the human distal gut Emma Allen-Vercoe University of Guelph
Isolation, selection, and polony amplification of single cells in a gel matrix Ronald Davis Stanford University
Metagenomic dissection of the gut microbiota Xiaoxia Lin University of Michigan at Ann Arbor
Tools for human microbiome studies John Nelson General Electric Global Research Center

New Tools