Register |Login






16S rRNA Trimmed Data Set

This file contains ~72 million reads corresponding to deconvoluted, trimmed 16S sequences from SRA Study id SRP002395 (Human Microbiome Project 16S rRNA 454 Clinical Production Phase I). This represents 7518 preparations from 5034 samples. 16S variable region V3-5 was sequenced for all 5034 samples, with variable regions V1-3 and V6-9 also sequenced for subsets of the samples. 18 body sub-sites are represented in this dataset.

This is a gzipped multi-FASTA file of reverse-complemented 454 clear ranges, with the following subsequences removed:

  • 1. initial "TCAG" (must have been present in the original read),
  • 2. reverse barcode sequence (must have been present in the original read),
  • 3. reverse primer sequence (must have been present in the original read),
  • 4. forward primer sequence (if present within the clear range.)

SRA runs containing a total of about 10,000 reads could not be successfully converted to SFF by the sffdump utility in the NCBI SRA SDK and have been excluded from this initial release.

Ongoing HMP 16S analyses are being performed on a dataset containing reads from both SRA Study ids SRP002395 (Human Microbiome Project 16S rRNA 454 Clinical Production Phase I) and SRP002012 (Human Microbiome Project 454 Clinical Production Pilot, PPS). This dataset currently represents only the former project. We are in the process of readying SRP002012 reads for release on this site.