报告题目: It’s All Relative: Testing Differential Abundance in Compositional Microbiome Data
Motivation: Compositional analysis of differential abundance is based on the log ratio of relative abundance data between pairs of taxa. This analysis aims at detecting taxa that initially responded to the condition change, not taxa that showed changes in relative abundance because of the compositional constraint, so the null hypothesis is that the ratio of the taxon relative abundance against a reference taxon remained the same. More importantly, this analysis is robust to experimental bias that universally occurred in every step of the experimental workflow of 16S amplicon or metagenomic sequencing. However, challenges arise when taking the log transformation of pervasive zero count data and selecting an appropriate reference taxon. Existing methods that add a pseudo count to zero data and/or normalize count data against some reference cannot always control the false discovery rate (FDR).
Methods: We develop a novel method for compositional analysis of differential abundance that properly handles these challenges. The idea is to apply the log-ratio transformation (against a reference taxon) to the underlying relative abundances, which are non-zero, relate them to the traits of interest via a linear regression model (i.e., a logistic regression model overall), and estimate the effect size of each trait on each taxon via generalized estimating equations (GEE). The dependence on the reference taxon is eliminated by a simple shift of each effect size by their median. Other complexities in microbiome data, e.g., over-dispersion, sparsity, and small sample size, are accounted for through permutation-based inference. In addition, the logistic-regression framework allows traits of interest to be either continuous or discrete (binary) and confounding covariates to be adjusted.
Results: Our simulations indicate that our method always preserved FDR and had much improved sensitivity over existing methods. In contrast, ANCOM often had inated FDR; ANCOM-BC largely controlled FDR but still had modest inflation occasionally; ALDEx2 generally had low sensitivity. The flexibility of our method for a variety of microbiome studies is illustrated by the analysis of data from two microbiome studies.
Asso. Prof. Hu obtained my BS in Statistics from Peking University (School of Mathematic Sciences) in 2005 and PhD in Biostatistics from University of North Carolina at Chapel Hill (School of Public Health) in 2011. Since then, I joined Emory University, Department of Biostatistics and Bioinformatics (Rollins School of Public Health), first as the endowed Rollins Assistant Professor from 2011-2017 and then promoted to Tenured Associate Professor since 2017. My research before tenure was focused on statistical genetics, i.e., developing statistical methods for analyzing large-scale genetics data in epidemiological and clinical studies such as genome-wide association studies (GWAS). My current research has been focusing on developing statistical methods and software packages for analysis of high-throughput microbiome data, as well as applying them to studies of various diseases such as HIV susceptibility, preterm birth, pediatric cancers, colorectal cancers, gynecological cancers, and Alzheimer’s disease.