An effective normalization method for taxonomic analysis of metagenomic sequence count data:

 

We propose a novel scaling normalization method to meet with the challenges of taxonomic analysis of multiple metagenomic communities based on next generation sequencing data. A key issue on this topic is how to make the samples comparable among communities. The proposed normalization takes into account of abundance proportions of taxonomic units, and the under-sampling issue.

 

This is an introduction (README file) to using the R code for the proposed normalization method, and to generate simulated metagenomic data.

 

 

R codes: WSS   Simulation

 

Example data (feature count and phenotype info)