TAEC: an R package for Taxonomic Analysis by Elimination and Correction

TAEC, a new homology-based approach for taxonomic analysis, utilizes the similarity in the genomic sequence in addition to the result of an alignment tool.

This approach consists of two main stages: the elimination stage and the correction stage. In the elimination stage, the potential true genomes identified by removing false genomes whose presence is most likely due to the presence of similar genomes in a sample. In the correction stage, the abundances of the genomes remaining after the elimination stage are corrected by utilizing the similarity between genomes in a system of linear equations. The overall workflow of TAEC is shown as below.

 

Note:

¾   The light yellow colored blocks are implemented by a user and the light blue colored blocks are internally implemented by TAEC.

¾   The bacteria database could be replaced with virus or other types of databases if needed.

¾   Similarity matrix is given for different lengths of sequence reads, at 100pb, 250pb, 500pb, and 1000bp.

 

R package of TAEC can be downloaded: (Mac version) and (linux version)

 

Note:

We have tested the TAEC package on R version 2.14.1 (2011-12-22) and version 2.15.2 (2012-10-26) on Redhat and R version 3.0.2 (2013-09-25) on Ubuntu. No errors associated with the different versions of R occurred. On the other hand, we encountered errors associated with different versions of R on Mac OSX. In order for the TAEC package to work properly on OSX, please upgrade your R to the current version of 3.0.2 (2013-09-25).

 

Citation:  Sohn M, An L, Pookhao N, Li Q. Accurate genome relative abundance estimation for closely related species in a metagenomic sample. BMC Bioinformatics 2014, 15:242 .