Installation

install.packages("devtools")
devtools::install_github("CharlieCarpenter/tidyMicro")
library(tidyMicro); library(magrittr)

tidyMicro

Reading in OTU Counts

The first step is to read in your table(s) of operational taxonomic unit (OTU) counts and the clinical data for your subjects. The tidyMicro package contains 4 OTU tables from premature infants who required mechanical ventilation and had tracheal aspirates samples collected at 7, 14, and 21 days of age (+/- 48 hours). Each infant had bronchopulmonary dysplasia severity classified as mild, moderate, or severe. The OTU tables come from the phylum, class, order, and family level classifications. The corresponding clinical data is also included. The data are published here.

## Loading OTU tables
data(phy, package = "tidyMicro")
data(cla, package = "tidyMicro")
data(ord, package = "tidyMicro")
data(fam, package = "tidyMicro")

## Loading meta data to merge with OTU table
data(clin, package = "tidyMicro")

The clinical data must have a column of unique sequencing libraries named “Lib” that match the column names of your OTU table(s).

The OTU tables should be formatted as we see below: 1. The first column contains the OTU names with taxonomic rank delineated by “/” 2. The following column names are the unique sequencing libraries 3. The rest of the table is populated with sequencing counts.

	OTU_Name	T0003	T0004	T0007	T0008	T0011
2	Bacteria	0	1	10	0	14323
3	Acidobacteria/Acidobacteria	0	0	0	0	0
4	Actinobacteria/Acidimicrobiia	0	0	0	0	0
5	Actinobacteria/Actinobacteria	6	48	0	6	1
6	Actinobacteria/Coriobacteriia	5	0	0	0	2
7	Actinobacteria/Nitriliruptoria	0	0	0	0	0

tidy_micro

Once the OTU table is formatted in this way the tidy_micro function is fairly simple to use. There are three possible ways to merge your OTU table(s) and clinical data into a “micro_set”.

Provide a single OTU table and the name separately
Provide an unnamed list of your OTU tables and provide the OTU names separately
1. Names provided will be the “Table” column in your final micro_set
  1. Order of names and OTU tables should match up
Provide a named list of your OTU tables
1. Names of OTU tables will be the “Table” column in your final micro_set


## 1. Single OTU table
micro.set <- tidy_micro(otu_tabs = cla,        ## OTU Table
                        tab_names = "Class",   ## OTU Names (Ranks)
                        clinical = clin)       ## Clinical Data

## 2. Unnamed List
otu_tabs <- list(phy, cla, ord, fam)
tab_names <- c("Phylum", "Class", "Order", "Family")

micro.set <- tidy_micro(otu_tabs = otu_tabs,   ## OTU Table
                        tab_names = tab_names, ## OTU Names (Ranks)
                        clinical = clin)       ## Clinical Data

## 3. Named List
otu_tabs <- list(Phylum = phy, Class = cla, Order = ord, Family = fam)

micro.set <- tidy_micro(otu_tabs = otu_tabs,  ## OTU Table
                        clinical = clin)      ## Clinical Data

The default for this function is to keep only the libraries found in each of the OTU tables’ column names and the sequencing library column of your clinical data. You will receive a warning if some libraries are excluded during the merge. This option can be turned off by setting the option complete_clinical = FALSE.

Below are the first 6 rows and 12 columns of the data frame we created with tidy_micro. The first 4 columns are the OTU table names, the subject/library names, the taxa names, and the sequencing depths (Total). Columns 4-8 are calculated from the given OTU table: bin is a binary variable for presence or absence of taxa in that subject/library, cts is the original count given, clr is centered log-ratio transformed counts, and ra is the relative abundance, \(RA = \frac{Taxa\ Count}{Seq\ Depth}\), of the taxa. A small amount (1/sequencing depth) is added to each library’s taxa count before the clr transformation in order to avoid \(log(0) = -\infty\). The following columns contain the supplied clinical data.

Table	Lib	Taxa	Total	bin	cts	clr	ra	study_id	weight	gender	gestational_age
Phylum	T0003	Bacteria	47490	0	0	-10.8545090	0.0000000	0127Y	0.7	M	25
Phylum	T0003	Acidobacteria	47490	0	0	-10.8545090	0.0000000	0127Y	0.7	M	25
Phylum	T0003	Actinobacteria	47490	1	11	2.3116626	0.0231628	0127Y	0.7	M	25
Phylum	T0003	Bacteroidetes	47490	1	57	3.9568171	0.1200253	0127Y	0.7	M	25
Phylum	T0003	Candidate-division-SR1	47490	0	0	-10.8545090	0.0000000	0127Y	0.7	M	25
Phylum	T0003	Candidate-division-TM7	47490	1	1	-0.0862135	0.0021057	0127Y	0.7	M	25

All of the following functions rely on this format of the data set, so this must be the first step for any use of this pipeline.

Data Exploration

For simplicity, we will continue using only the sequencing information from day 7.

micro.set %<>% filter(day == 7)

Several standard plots useful for data exploration are available within tidy.micro.

Taxa Summaries

The function taxa_summary will supply a table of useful descriptive statistics. It will output a table containing group counts, the percent of subjects with \(RA=0\), the RA mean, RA median, RA standard deviation, RA IQR, and several RA percentiles. You can control the taxa information summarized using the obj argument. You can stratify by categorical variables using the ... argument. If the table argument is left to NULL, taxa_summary will summarize all tables within your micro_set.

taxa_summary(micro.set, table = "Phylum") %>% 
  knitr::kable()

Table	Taxa	n	Percent_0	Mean	SD	Median	IQR	Percentile_5th	Percentile_10th	Percentile_25th	Percentile_75th	Percentile_90th	Percentile_95th
Phylum	Acidobacteria	24	100.00	0.0000000	0.0000000	0.0000000	0.0000000	0.00	0.00	0.00	0.00	0.00	0.00
Phylum	Actinobacteria	24	0.00	0.3341367	1.0115688	0.0257123	0.0765905	0.00	0.00	0.01	0.09	0.45	1.65
Phylum	Bacteria	24	33.33	4.4180972	20.2317256	0.0048883	0.0160987	0.00	0.00	0.00	0.02	0.07	5.64
Phylum	Bacteroidetes	24	0.00	0.1990559	0.1534557	0.1731676	0.1453758	0.01	0.02	0.10	0.24	0.43	0.48
Phylum	Candidate-division-SR1	24	91.67	0.0001979	0.0007042	0.0000000	0.0000000	0.00	0.00	0.00	0.00	0.00	0.00
Phylum	Candidate-division-TM7	24	41.67	0.0040695	0.0058673	0.0029175	0.0042201	0.00	0.00	0.00	0.00	0.01	0.02
Phylum	Cyanobacteria	24	87.50	0.0006640	0.0020834	0.0000000	0.0000000	0.00	0.00	0.00	0.00	0.00	0.00
Phylum	Deinococcus-Thermus	24	100.00	0.0000000	0.0000000	0.0000000	0.0000000	0.00	0.00	0.00	0.00	0.00	0.00
Phylum	Firmicutes	24	0.00	70.9380828	39.5968844	96.6334919	56.0562921	0.42	2.49	43.34	99.39	99.67	99.71
Phylum	Fusobacteria	24	4.17	0.0311693	0.0276454	0.0194823	0.0274130	0.01	0.01	0.01	0.04	0.07	0.08
Phylum	Proteobacteria	24	0.00	5.1187023	20.4554034	0.1416012	0.1792101	0.01	0.02	0.04	0.22	0.88	16.65
Phylum	Spirochaetae	24	87.50	0.0008745	0.0026564	0.0000000	0.0000000	0.00	0.00	0.00	0.00	0.00	0.01
Phylum	Synergistetes	24	100.00	0.0000000	0.0000000	0.0000000	0.0000000	0.00	0.00	0.00	0.00	0.00	0.00
Phylum	Tenericutes	24	4.17	18.9420583	34.3059209	0.0268098	17.2474957	0.00	0.00	0.01	17.26	83.10	92.16
Phylum	Unclassified	24	20.83	0.0128916	0.0177126	0.0061030	0.0094733	0.00	0.00	0.00	0.01	0.04	0.05

PCA and PCoA Plots

Principal Components

micro_pca will calculate principle components on the centered log ratio transformation of the taxa counts using the prcomp function from the stats package. Scaling the taxa counts to a unit variance is the default option, and recommended, but this can be changed using scaled = F. The components are then plotted using the ggbiplot function.

We need to specify the OTU table we’d like to work with using the table argument. This is consistent throughout the pipeline. Different tables can contain different the taxonomic ranks for analysis as it does in this walk through (e.g. phylum, family, or genus level data), or the different tables could also reflect other important differences such as sampling site (e.g. skin, nasal, gut, etc).

micro.set %>% micro_pca(table = "Family",   ## Taxonomic table of interest
                       grp_var = bpd1, ## A factor variable for colors
                       legend_title = "BPD Severity")

PCA Plot. Principle components of CLR transformed Family level OTU table.

Principal Coordinates

Principle components calculated from a dissimilarity matrix are called “principle coordinates”. If we supply a dissimilarity or distance matrix, e.g. a beta diversity, micro_pca will output a principal coordinate plot. Principle coordinate analysis (PCoA) and principle component analysis (PCA) are both excellent exploratory plots, and are the same process mathematically. However, PCoA is more appropriate than PCA when data are missing or when there are fewer subjects than there are dimensions of the feature space, as is often the case in microbiome (or any omics) analysis.

bray_beta <- micro.set %>%  ## Calculating dissimilarity
  beta_div(table = "Family")
  
micro.set %>% 
  micro_pca(dist = bray_beta, ## Beta diversity
            grp_var = bpd1, ## A factor variable for colors
            legend_title = "BPD Severity")      
#> PCoA plot created

PCoA Plot. Principle coordinates of Family level OTU table Bray-Curtis dissimilarity.

Longitudinal PCA/PCoA

The following methods were established in microbiome data by K. Williamson and is based off of the Tucker3 loss function. These are tools for dealing with repeated measures within the ordination framework.

long_micro <- tidy_micro(otu_tabs = otu_tabs,  ## OTU tables
                         clinical = clin)      ## clinical Data
#> Contains 74 libraries from OTU files.
#> Summary of sequencing depth:
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    8851   24938   33314   36650   43590   97408

Three Mode PCA

Three Mode PCA is a way of creating Principle component plots while controlling for correlations between observations. We can interpret them the same as we would any PCA plot. Unfortunately we aren’t seeing much separation between our groups.

The function requires your micro_set to have a column that indicates your subject names and a column that indicates your time points. The function also requires a grouping variable.

long_micro %>% 
  three_mode(table = "Family", group = bpd1, subject = study_id, 
             time_var = day, main = "ThreeMode PCA", 
             subtitle = "3 Time Points", legend_title = "BPD Severity")
#> Warning: Subjects are not consistent across time points.
#> Only complete cases will be used.
#> Found 3 time points and 15 subjects with complete cases.
#> Rational ORTHONORMALIZED start
#> Tucker3 function value at start is  9.09494701772928e-13
#> Tucker3 function value is 9.09494701772928e-13 after 3 iterations
#> Fit percentage is 100 %
#> Procedure used 0.03 seconds

Three Mode PCA Plot. Principle components controlled for repeaated measures.

Notice the warning message that pops up. The function will automatically pull out the subjects that are in every time point and only make plots with their information.

3D PCA

These 3D plots collapse over a chosen dimension of variation (e.g. time, subject, or taxa) and perform principle component / coordinate analyses on the remaining two dimensions.

In order to make the function work we need to supply a subject, time, and the number of time points as in the three_mode function.

long_micro %>%
  pca_3d(table = "Family", time_var = day, subject = study_id, 
         modes = "AC", type = "PCoA")
#> Warning: Subjects are not consistent across time points.
#> Only complete cases will be used.
#> Found 3 time points and 15 subjects with complete cases.
#> Tucker3 function value at start is  10716.5212840935
#> Tucker3 function value is 0 after 2 iterations
#> Fit percentage is 100 %
#> Procedure used 0.01 seconds

3D Time PCoA Plot. Plotting principle coordinants collapsing over time axis.

You can change which axis to collapse over by changing the “modes” to either “BA” or “CB”.

long_micro %>%
  pca_3d(table = "Family", time_var = day, subject = study_id, 
         modes = "CB", type = "PCoA")
#> Warning: Subjects are not consistent across time points.
#> Only complete cases will be used.
#> Found 3 time points and 15 subjects with complete cases.
#> Tucker3 function value at start is  10716.5212840935
#> Tucker3 function value is 0 after 2 iterations
#> Fit percentage is 100 %
#> Procedure used 0.01 seconds

3D Subject PCoA Plot. Plotting principle coordinants collapsing over subject axis.

Bar Charts

Stacked bar charts of taxa RA are very standard visualizations for microbiome research. We can create bar charts of the raw RA stratified by categorical variable(s) of interest. As before, we need to specify the OTU table we want to plot.

ra_bars(micro.set,         ## Dataset
        table = "Phylum",  ## Table we want
        bpd1,              ## Variable of interest
        ylab = "% RA", 
        xlab = "BPD", 
        main = "Stacked Bar Charts")

Stack Bar Chart. Stacked bar charts of taxa relative abundance by BPD severity.

As you can see, several of these phylum are very low abundance within the cohort, not to mention how quickly our plot legend would overflow if we tried to plot all of our taxa orders or families. Several of our functions include options to aggregate low taxa counts together in order to clean up these plots. 1. Specify how many taxa will be named and used in the stacked bar charts using top_taxa. This option will take \(X\) taxa with the highest average RA and aggregate all others into an “Other” category. 2. Use the RA option to aggregate all taxa with RA below your cutoff into an “Other” category.

Only one of these can be used at a time. If a particular taxa is of interest you can use specific_taxa to pull any taxa out of the “Other” category if it doesn’t meet the bar set by either top_taxa or RA.

ra_bars(micro.set,          ## Dataset
         table = "Phylum",  ## Table we want
         bpd1,              ## Variable of interest
         top_taxa = 3,     
         RA = 0,
         specific_taxa = c("Actinobacteria", "Bacteroidetes"),
         ylab = "% RA", xlab = "BPD", main = "Stacked Bar Charts")

‘Taxa-Filtered’ Stack Bar Chart. Stacked bar charts of taxa relative abundance using aggregated counts.

ra_bars is flexible enough to include multiple factors in the ... argument.


ra_bars(micro.set,           ## Dataset
        table = "Phylum",    ## Table we want
        bpd1, gender,        ## Variables of interest
        top_taxa = 6,
        ylab = "% RA", xlab = "BPD by Sex", 
        main = "Stacked Bar Charts") +
   theme(axis.text.x = element_text(angle = 45, hjust = 1))

Multiple Factor Stack Bar Chart. Stacked bar charts of taxa relative abundance by multiple factors.

This function can also create subject level bar charts by using “Lib” as your grouping variable.

ra_bars(micro.set,          ## Dataset
        table = "Phylum",   ## Table we want
        Lib,                ## Variable of interest
        top_taxa = 6,
        ylab = "% RA", xlab = "Library", main = "Stacked Bar Charts") +
   theme(axis.text.x = element_text(angle = 90, hjust = 1))

Subject Stack Bar Chart. Subject level stacked bar charts of taxa relative abundance.

Or just create a single stacked bar chart for the entire cohort by ignoring the ... argument.

ra_bars(micro.set,            ## Dataset
        table = "Phylum",     ## Table we want
        top_taxa = 6, 
        ylab = "% RA", main = "Stacked Bar Charts")

Cohort Stack Bar Chart. Overall microbiome community taxa relative abundance.

Note that you can manipulate the output plots by adding on geoms as we did in ra_bars. Most plotting functions in this pipeline will return a ggplot that you can manipulate through additional geom just like any other ggplot.

ra_bars(micro.set,            ## Dataset
        table = "Phylum",     ## Table we want
        top_taxa = 6,
        main = "Manipulated Stacked Bar Charts") +
  
  ## Additional geoms
  theme_dark() + 
  coord_flip() + 
  theme(legend.title = element_text(color = "blue", size = 20),
        legend.text = element_text(color = "red"))

Manipulated Stack Bar Chart. Stacked bar charts of taxa relative abundance by multiple factors.

Box Plots

We can make box plots of taxa relative abundance stratified by some categorical variable using the function taxa_boxplot.

staph <- "Firmicutes/Bacilli/Bacillales/Staphylococcaceae"

taxa_boxplot(micro.set,      ## Our dataset
             taxa = staph,   ## Taxa we are interested in
             bpd1,           ## Variable of interest
             xlab = "BPD", 
             ylab = "Relative Abundance", 
             main = "Box Plot")

Single Variable Box Plots. Box plots of Staphylococcaceae relative abundance by BPD severity.

We can plot other taxa information besides the relative abundance (such as the raw counts) by specifying the y argument. This function also allows for multiple variables of stratification, and will give you the interaction of all categorical variables given. There is an apparent trend here, but be aware that raw counts are rarely the information of interest for analysis. Relative abundance or the centered log ratio is almost always more meaningful.

taxa_boxplot(micro.set,        ## Our dataset
             taxa = staph,     ## Taxa we are interested in
             y = clr,          ## Making Boxplot of CLR
             bpd1, gender,     ## Variables of interest
             ylab = "Staphylococcaceae CLR",
             main = "Box plot", subtitle = "Subtitle") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Multi-Variable Box Plots. Box plots of Staphylococcaceae relative abundance by BPD severity and sex.

Correlation Plots

Correlations between taxa RA and continuous variables can be useful descriptives during data exploration. The functions cor_heatmap and cor_rocky_mtn will calculate correlation between the selected variable(s) and taxa information. The taxa information (counts, CLR, or RA) can be controlled through the y argument, and the type of correlation can be controlled through the cor_type argument. We recommend using either CLR transformed counts with Spearman’s rank based correlation or taxa RA with Kendall’s rank based correlation. Since the large number of 0 counts often present will create many ties Kendall’s correlation more appropriate for taxa RA.

Heat Maps

Correlations for each taxa within the selected table are plotted in a heat map for one or several continuous variables. This style of non-symmetric heat map is sometimes called a “lasagna plot.”

micro.set %>% cor_heatmap(table = "Class", weight, gestational_age)

Lasagna Plot. Spearman orrelations between CLR taxa counts infant weight, gestational age (weeks).

Rocky Mountain Plots

The rocky mountain plot shows the correlation for each taxa within the selected table displayed along the horizontal axis. They will be color coded by the phylum they belong to (characters before the first “/” in the taxa name), and taxa with a correlation magnitude greater than or equal to a specified cutoff (cor_label) will be labeled. This function uses the same options for cor_type and y as cor_heatmap.

micro.set %>% cor_rocky_mtn(table = "Class",
                            weight,
                            cor_label = 0.3)

Correlation Rocky Mountain Plot. Rocky mountain plot of Spearman’s correlation betwen CLR taxa counts and class level taxa.

Alpha Diversities

alpha_div calculates Good’s Coverage and the following alpha diversities through rarefied bootstrap samples and attaches them as columns to the given micro_set:

Richness: \(S_{obs}\) and Chao1
Evenness: Shannon’s and Simpsons E
Diversity: Shannon’s H and Simpsons D

It will either calculate the alpha diversity based on the specified table and attach it within each table of your micro_set, or calculate the alpha diversity of each table if no table is specified. Either way the alpha diversities will be attached as if they were a part of your clinical data. If your micro_set contains OTU tables of different taxonomic ranks, we recommend only calculating and analyzing the alpha diversity of the lowest rank.

This step can be computationally intensive. Lowering the number of bootstrap iterations with the iter argument can speed this up. We recommend removing subjects with poor sequencing as well (for all analyses). There are two options for removing subjects with poor sequencing:

min_depth: Remove libraries with sequencing depths (Total) below min_depth.
min_goods: Remove libraries with Good’s Coverage below min_goods.

micro_alpha <- alpha_div(micro.set, 
                         table = "Family",  ## Table of interest
                         min_depth = 5000,  ## Requires a Seq Depth of 5000 to be included
                         min_goods = 80)    ## Requires a Good's coverage of %80 
#> Warning in alpha_div(micro.set, table = "Family", min_depth = 5000, min_goods = 80): The minimum Library size is 8851.
#> No libraries dropped based on sequencing depth or Good's coverage.

Regress on Alpha Diversity

Once alpha diversities are calculated, standard regression can be used to analyse alpha diversities. micro_alpha_reg is a simple wrapper function that will run linear regression on each alpha diversity with your specified covariate pattern within the specified rank. Covariates included in the function will be added together in your covariate structure. For instance, if we include Group, Age, Sex, the function will fit \[\hat{alpha} \sim Group + Age + Sex.\] You can also include an interaction by typing Group * Age.

The function will output a summary table for every model containing a column for the alpha diversity measure, the model coefficients, the \(\beta\) estimates, the standard errors of that estimate, the test statistic, the p-value, and a 95% confidence interval for the \(\beta\) estimates.

micro_alpha %>% 
  micro_alpha_reg(table = "Family", bpd1, gender) %>% 
  knitr::kable()

#>    Alpha_Div         Coef  Beta std.error     t.stat p.value              CI_95
#> 1      Goods  (Intercept) 99.98      0.01 10736.8720  0.0000 (99.9604, 99.9992)
#> 2      Goods bpd1Moderate  0.00      0.01     0.0102  0.9920  (-0.0222, 0.0224)
#> 3      Goods   bpd1Severe -0.01      0.01    -0.7824  0.4431  (-0.0321, 0.0146)
#> 4      Goods      genderM  0.01      0.01     0.7455  0.4646   (-0.0104, 0.022)
#> 5       Sobs  (Intercept) 17.66      2.80     6.3000  0.0000  (11.811, 23.5039)
#> 6       Sobs bpd1Moderate -1.91      3.21    -0.5944  0.5589  (-8.6098, 4.7911)
#> 7       Sobs   bpd1Severe -0.26      3.37    -0.0771  0.9393   (-7.2915, 6.772)
#> 8       Sobs      genderM -1.12      2.33    -0.4798  0.6366   (-5.9886, 3.749)
#> 9      Chao1  (Intercept) 22.43      3.40     6.5938  0.0000 (15.3315, 29.5204)
#> 10     Chao1 bpd1Moderate -1.27      3.90    -0.3268  0.7472  (-9.4047, 6.8569)
#> 11     Chao1   bpd1Severe  0.15      4.09     0.0377  0.9703  (-8.3784, 8.6872)
#> 12     Chao1      genderM -1.83      2.83    -0.6465  0.5253  (-7.7393, 4.0769)
#> 13  ShannonE  (Intercept)  0.10      0.06     1.6759  0.1093  (-0.0239, 0.2197)
#> 14  ShannonE bpd1Moderate  0.01      0.07     0.1388  0.8910  (-0.1303, 0.1489)
#> 15  ShannonE   bpd1Severe  0.04      0.07     0.6153  0.5453  (-0.1033, 0.1897)
#> 16  ShannonE      genderM -0.06      0.05    -1.1632  0.2584   (-0.158, 0.0449)
#> 17  ShannonH  (Intercept)  0.41      0.24     1.6985  0.1049  (-0.0925, 0.9037)
#> 18  ShannonH bpd1Moderate  0.03      0.27     0.1002  0.9211  (-0.5435, 0.5983)
#> 19  ShannonH   bpd1Severe  0.18      0.29     0.6154  0.5452  (-0.4224, 0.7759)
#> 20  ShannonH      genderM -0.24      0.20    -1.1997  0.2443  (-0.6534, 0.1763)
#> 21  SimpsonD  (Intercept)  1.17      0.22     5.3648  0.0000   (0.7144, 1.6234)
#> 22  SimpsonD bpd1Moderate  0.13      0.25     0.5036  0.6200  (-0.3951, 0.6466)
#> 23  SimpsonD   bpd1Severe  0.27      0.26     1.0363  0.3124  (-0.2751, 0.8182)
#> 24  SimpsonD      genderM -0.22      0.18    -1.2253  0.2347  (-0.6008, 0.1562)

Beta Diversities

beta_div is a simple wrapper around the vegdist function. It requires you to specify the OTU table and the method of beta diversity you want calculated.

## Beta Diversity
bray <- beta_div(micro.set, table = "Class", method = "bray")

Heat Map

The beta_heatmap function can be used to create a heat map of your calculated beta diversities ordered by some categorical variable of interest.

bray %>% 
  beta_heatmap(micro.set, bpd1)

Beta Diversity Heat Map. Heat map of Bray-Curtis beta diversity grouped by BPD severity.

PERMANOVA

The vegan package contains a useful function called adonis2 for running a PERMANOVA test. PERMANOVA is a permutation based ANOVA using a distance matrix as your response. micro_PERMANOVA is simply a wrapper function to make working within your micro_set easy. The output is analogous to that of a standard ANOVA. Please note that the F statistic given is a “pseudo-F” statistic from a “pseudo-F test” since it is based off of a permutation distribution not a true F distribution.

micro_PERMANOVA(micro.set, ## micro_set to pull covariates from
                bray,      ## Beta diversity matrix (or any distance matrix)
                method = "bray",   ## method used to calculate the beta diversity
                bpd1, mom_ethncty_2)  ## Covariates
#> Permutation test for adonis under reduced model
#> Terms added sequentially (first to last)
#> Permutation: free
#> Number of permutations: 999
#> 
#> vegan::adonis2(formula = f, data = micro_set, permutations = nperm, method = method)
#>               Df SumOfSqs      R2      F Pr(>F)  
#> bpd1           2   0.2157 0.04301 0.5041  0.842  
#> mom_ethncty_2  1   0.5207 0.10381 2.4335  0.059 .
#> Residual      20   4.2797 0.85318                
#> Total         23   5.0162 1.00000                
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Taxa Models and Plots

Filtering on a subset

There are several reasons you might want to aggregate rare taxa counts into an “Other” category. Some plots and figures (e.g. corHeatmap, corRockyMtn, ) will look nicer without all of the extra names cluttering the legend, some taxa might not be fully classified, or their models might be unstable and therefore a waste of computation time. We can use the otu_filter function to apply filtering requirements to each table within the micro_set. There are three functions built in to filter out taxa with low counts.

prev_cutoff is a prevalence cutoff where \(X%\) of subjects must have this taxa present or it will be included in the “Other” category.
ra_cutoff is a relative abundance (RA) cutoff where at least one subject must have a RA above the cutoff or the taxa will be included in the “Other” category.
exclude_taxa can be used to specify any taxa that you would like to make sure are included in the “Other” category. For instance, our OTU tables have taxa that are named “Unclassified” that we’d like to exclude. This can be a character string of any length.
1. Please note that the taxa specified with exclude_taxa will be filtered out of every OTU table given. To avoid this, you can make a subset of the data before filtering.

## Taxa names "Bacteria" are essentially unclassified
exclude_taxa <- c("Unclassified", "Bacteria")

micro.filt <- micro.set %>% 
  otu_filter(prev_cutoff = 1,               ## Prevalence cutoff
             ra_cutoff = 0.1,               ## Relative abundance cutoff
             exclude_taxa = exclude_taxa)   ## Uninteresting taxa
#> Filter for Class counts
#> Found 'Unclassified' category in input data.
#> Created new 'Other' category.
#> Found 'Bacteria' category in input data.
#> Found 34 OTUs.
#> Collapsed 2 OTUs into 'Other' in OTU table.
#> Converted 61662 counts to 'Other' otu category.
#> Remaining OTUs: 33 (Including 'Other').
#> Prevalence cutoff: 1% (i.e., at least 1% of libaries must be represented to keep OTU)
#> Found 33 OTUs.
#> Found 'Other' category in input data.
#> Collapsed 12 OTUs into 'Other' in OTU table.
#> Converted 0 counts to 'Other' in otu category.
#> Remaining OTUs: 21 (Including 'Other').
#> Relative abundance cutoff: 0.1% (i.e., at least one library must have RA > 0.1% to keep OTU).
#> Found 21 OTUs.
#> Found 'Other' category in input data.
#> Collapsed 12 OTUs into 'Other' in OTU table.
#> Converted 337 counts to 'Other' otu category.
#> Remaining OTUs: 9 (Including 'Other').
#> Filter for Family counts
#> Found 'Unclassified' category in input data.
#> Created new 'Other' category.
#> Found 'Bacteria' category in input data.
#> Found 116 OTUs.
#> Collapsed 2 OTUs into 'Other' in OTU table.
#> Converted 61662 counts to 'Other' otu category.
#> Remaining OTUs: 115 (Including 'Other').
#> Prevalence cutoff: 1% (i.e., at least 1% of libaries must be represented to keep OTU)
#> Found 115 OTUs.
#> Found 'Other' category in input data.
#> Collapsed 45 OTUs into 'Other' in OTU table.
#> Converted 0 counts to 'Other' in otu category.
#> Remaining OTUs: 70 (Including 'Other').
#> Relative abundance cutoff: 0.1% (i.e., at least one library must have RA > 0.1% to keep OTU).
#> Found 70 OTUs.
#> Found 'Other' category in input data.
#> Collapsed 50 OTUs into 'Other' in OTU table.
#> Converted 751 counts to 'Other' otu category.
#> Remaining OTUs: 20 (Including 'Other').
#> Filter for Order counts
#> Found 'Unclassified' category in input data.
#> Created new 'Other' category.
#> Found 'Bacteria' category in input data.
#> Found 62 OTUs.
#> Collapsed 2 OTUs into 'Other' in OTU table.
#> Converted 61662 counts to 'Other' otu category.
#> Remaining OTUs: 61 (Including 'Other').
#> Prevalence cutoff: 1% (i.e., at least 1% of libaries must be represented to keep OTU)
#> Found 61 OTUs.
#> Found 'Other' category in input data.
#> Collapsed 21 OTUs into 'Other' in OTU table.
#> Converted 0 counts to 'Other' in otu category.
#> Remaining OTUs: 40 (Including 'Other').
#> Relative abundance cutoff: 0.1% (i.e., at least one library must have RA > 0.1% to keep OTU).
#> Found 40 OTUs.
#> Found 'Other' category in input data.
#> Collapsed 24 OTUs into 'Other' in OTU table.
#> Converted 525 counts to 'Other' otu category.
#> Remaining OTUs: 16 (Including 'Other').
#> Filter for Phylum counts
#> Found 'Unclassified' category in input data.
#> Created new 'Other' category.
#> Found 'Bacteria' category in input data.
#> Found 15 OTUs.
#> Collapsed 2 OTUs into 'Other' in OTU table.
#> Converted 61662 counts to 'Other' otu category.
#> Remaining OTUs: 14 (Including 'Other').
#> Prevalence cutoff: 1% (i.e., at least 1% of libaries must be represented to keep OTU)
#> Found 14 OTUs.
#> Found 'Other' category in input data.
#> Collapsed 3 OTUs into 'Other' in OTU table.
#> Converted 0 counts to 'Other' in otu category.
#> Remaining OTUs: 11 (Including 'Other').
#> Relative abundance cutoff: 0.1% (i.e., at least one library must have RA > 0.1% to keep OTU).
#> Found 11 OTUs.
#> Found 'Other' category in input data.
#> Collapsed 4 OTUs into 'Other' in OTU table.
#> Converted 50 counts to 'Other' otu category.
#> Remaining OTUs: 7 (Including 'Other').

Important Note on Filtering

Please note that we did not filter before calculating our diversity measures! If we aggregate rare taxa into a single category, this will bias all of our diversity measures. Our Sobs and Choa1 in particular will be greatly biased. If you do not wish to calculate alpha or beta diversities, these filtering options are also available in the initial tidy_micro read in step.

## Named List
otu_tabs <- list(Phylum = phy, Class = cla, Ord = ord, Family = fam)

tidy.filt <- tidy_micro(otu_tabs = otu_tabs,  ## OTU Table
                        clinical = clin,      ## Clinical Data
                        prev_cutoff = 1,      ## Prevalence cutoff
                        ra_cutoff = 1,        ## Relative abundance cutoff
                        exclude_taxa = exclude_taxa)   ## Uninteresting taxa

Modeling

Negative Binomial Models

It is a standard practice to model the relative abundance of each taxa using a negative binomial distribution. The function nb_mods will fit negative binomial models for each taxa within a specified table using the observed counts and the total sequencing counts as an offset. It uses glm.nb from the MASS package to fit these models; the profile likelihood confidence intervals are also calculated through the MASS package using the confint function.

As in micro_alpha_reg and bb_mods, nb_mods will take each variable you specify as a new term to add into the model. For instance, if we include Group, Age, Sex the function will run the model \[log(\hat{cts}) \sim \beta_0 + \beta_1 Group + \beta_2 Age + \beta_3 Sex + log(Total).\] The offset of log(Total) will be included automatically and is recommended, however it is possible to remove this term with Offset = FALSE

nb_fam <- micro.filt %>%       ## micro_set
  nb_mods(table = "Family",    ## Rank of taxa we want to model
          bpd1)                ## The covariate in our model
#> 
#> 18 taxa converged
#> 2 taxa did not converge

As in micro_alpha_reg and bb_mods, you can also include interaction terms (such as Age*Sex) as you can with any other model.

## If we wanted the covariates to be bpd1+gender+bpd1*gender we just need input bpd1*gender.

nb_int <- micro.filt %>% 
  nb_mods(table = "Class", bpd1*gender)

Notice that the function will tell you how many different models converged and how many did not. To explore the convergent models we can access the Convergent_Summary from nb_mods. The convergent summary table gives the taxa name, the model coefficients, the estimated \(\beta s\), profile likelihood confidence limits, the Z-score, the p-value from a Wald test, an FDR adjusted p-value, and finally an Anova test from a likelihood-ratio test. Below is the summary information from the first two convergent taxa.

nb_fam$Convergent_Summary %>% .[1:6,] %>% knitr::kable()

Taxa	Coef	Beta	CI	Z	P_val	FDR_Pval	LRT
Actinobacteria/Actinobacteria/Actinomycetales/Actinomycetaceae	(Intercept)	-8.2921218	(-9.5266, -6.4186)	-11.1387	0.0000000	0.0000	NA
Actinobacteria/Actinobacteria/Actinomycetales/Actinomycetaceae	bpd1Moderate	-0.4842894	(-2.5126, 1.1272)	-0.5550	0.5789104	0.7105	0.5330622
Actinobacteria/Actinobacteria/Actinomycetales/Actinomycetaceae	bpd1Severe	-0.9958385	(-3.0842, 0.7387)	-1.0918	0.2749094	0.4241	0.5330622
Actinobacteria/Actinobacteria/Corynebacteriales/Corynebacteriaceae	(Intercept)	-5.3792759	(-7.6366, 1.9534)	-2.9617	0.0030599	0.0083	NA
Actinobacteria/Actinobacteria/Corynebacteriales/Corynebacteriaceae	bpd1Moderate	-2.2474168	(-9.7445, 1.7989)	-1.0590	0.2896069	0.4344	0.0000000
Actinobacteria/Actinobacteria/Corynebacteriales/Corynebacteriaceae	bpd1Severe	0.1151231	(-7.4138, 4.5979)	0.0527	0.9579410	0.9599	0.0000000

nb_mods also provides an Estimate_Summary table that contains a table of model summaries for convergent taxa that is more readily exportable into publications. It gives the taxa name, the model coefficients, the exponentiated beta coefficients for the rate ratio, a Wald confidence interval, the Z-score, and FDR_Pvalue. For interaction terms, the rate ratios are calculated taking their main effects into account. That is, rate ratios are the exponentiation sum of interaction and main effect \(\beta\) coefficients, and confidence intervals are for the exponentiation sum of the \(\beta\) coefficients. The FDR p-values are for the individual \(\beta\) coefficients. Below is the summary information from the first two convergent taxa.

nb_fam$Estimate_Summary %>% .[1:4,] %>% knitr::kable()

Taxa	Coef	RR	CI_95	Z	FDR_Pval
Actinobacteria/Actinobacteria/Actinomycetales/Actinomycetaceae	bpd1Moderate	0.6161	(0.1114, 3.4078)	-0.5550	0.7105
Actinobacteria/Actinobacteria/Actinomycetales/Actinomycetaceae	bpd1Severe	0.3694	(0.0618, 2.2075)	-1.0918	0.4241
Actinobacteria/Actinobacteria/Corynebacteriales/Corynebacteriaceae	bpd1Moderate	0.1057	(0.0016, 6.7678)	-1.0590	0.4344
Actinobacteria/Actinobacteria/Corynebacteriales/Corynebacteriaceae	bpd1Severe	1.1220	(0.0156, 80.9383)	0.0527	0.9599

For the non-convergent taxa we can explore their summary information within the RA_Summary from nb_mods. This is a table of summary measures for all taxa (not just the non-convergent taxa) that is stratified by the categorical variables in the models. It includes the counts (n), percent of subjects with counts of 0, the average, median, standard deviation, IQR, percentiles of taxa RA, and finally an indicator for whether or not this taxa’s negative binomial model converged.

Beta Binomial Models

Another standard practice is to model taxa abundance using a beta binomial distribution. The function bb_mods will fit beta binomial model to each taxa within a specified table using the vglm function from the VGAM package. Confidence intervals are fit using the confintvglm function. The current default for confidence intervals around individual \(\beta\) parameters are Wald intervals, although this can be change to profile likelihood confidence intervals using CI_method = "profile". Profile likelihoods are much more computationally intensive.

bb_fam <- micro.filt %>%       ## micro_set
  bb_mods(table = "Phylum",    ## Table we want to model
          bpd1)                ## The covariate in our model

As in micro_alpha_reg and nb_mods, bb_mods will take each variable you specify as a new term to add into the model. For instance, if we include Group, Age, Sex the function will run the model \[logit(\hat{RA}) \sim \beta_0 + \beta_1 Group + \beta_2 Age + \beta_3 Sex\]

Again, as in micro_alpha_reg and nb_mods, you can also include interaction terms (such as Age*Sex) as you can with any other model.

## If we wanted the covariates to be bpd1+gender+bpd1*gender we just need input bpd1*gender.

bb_int <- micro.filt %>% 
  bb_mods(table = "Class", bpd1*gender)

Plotting

The model output from nb_mods and bb_mods can easily cause some information overload. tidyMicro includes several useful functions to visualize the results of all convergent models. They function the same way for both model types, so we will only show examples from our nb_mods output.

Stacked Bar Charts

We have created functions that will calculate the estimated RA of all taxa based on the convergent models. This gives us the ability to visualize stacked bar charts of taxa RA while controlling for other variables in the model. For instance, if our model is \[log(\hat{cts}) \sim \beta_0 + \beta_1 Group + \beta_2 Age + \beta_3 Sex + log(Total),\] we can visualize the estimated differences in RA among different groups while holding Age and Sex constant.

The functions nb_mods and bb_mods will create these stacked bar charts based on the output of from nb_mods and bb_mods, respectively. They requires the name of a covariate in your model and can create plots based on main effects or interactions. If a continuous variable is one of the supplied covariates there are two options for visualization available through the quant_style argument. You can either visualize two points as two separate bars (quant_style = "discrete") or visualize the continuous change (quant_style = "continuous"). By default the functions will use the first and third quartiles of the continuous variable as your endpoints, but this can be changed through the range option. nb_bars and bb_bars also give you the ability to aggregate estimated RA into and “Other” category just like the ra_bars function.

nb_fam %>% nb_bars(bpd1,                 ## Covariate of interest
                   top_taxa = 5,         ## How many named taxa we want
                   xlab = "Group", 
                   xaxis = c("1","2","3")) ## Labels

Negative Binomial Bar Charts. Stacked bar charts of negative binomial estimated taxa RA by BPD severity.

Object Flexibility

Including every taxonomic level in your legend often looks cluttered. The nb_bars and bb_bars functions use the Model_Coef data frame from nb_mods and bb_mods output to create their bar charts. You can manipulate the taxa names here to avoid confusion further up in the pipeline.

nb_fam$Model_Coef$Taxa %<>% 
  stringr::str_split("/") %>%  ## Splitting by "/" in taxa names
  lapply(function(x) x[length(x)]) %>% ## selecting piece after the last "/"
  unlist ## Unlisting to put back into data frame

## Reordering to put "Other" at the bottom of the legend
## our functions usually do this automatically, but we'll need to do 
## this externally since we are messing with the output
non.other <- nb_fam$Model_Coef %>% 
  filter(Taxa != "Other") %>% 
  arrange(Taxa) %>% 
  distinct(Taxa) %>%
  pull(Taxa)

nb_fam$Model_Coef$Taxa <- factor(nb_fam$Model_Coef$Taxa,
                                 levels = c(non.other, "Other"))

nb_fam %>% nb_bars(bpd1,                 ## Covariate of interest
                   top_taxa = 5,         ## How many named taxa we want
                   xlab = "BPD Severity", main = "Cleaner Legend") ## Labels

Rocky Mountain Plots

The function micro_rocky_mtn displays magnitude of the log FDR adjusted p-values for each of the taxa in nb_mods as vertical bars next to each other along the x-axis. The direction of the bars will be determined by the direction of the estimated relationship. The taxa will be color coded by the phylum they belong to, and taxa that have FRD adjusted p-values for the specified covariate below your desired significance cutoff will be labeled. This significance cutoff is 0.05 by default and can be changed through the alpha argument. You can also turn off the labels with sig_text = FALSE.

## Order level models
nb_ord <- micro.filt %>%       ## micro_set
  nb_mods(table = "Order",     ## Rank of taxa we want to model
          bpd1)                ## The covariate in our model
#> 
#> 15 taxa converged
#> 1 taxa did not converge

nb_ord %>%
  micro_rocky_mtn(bpd1, ## Covariate of interest
                  xlab = "Taxa", main = "Rocky Mountain Plot",
                  subtitle = "Direction of bar indicates direction of relationship", facet_labels = c("Moderate", "Severe"))

Negative Bionomial Rocky Mountian Plot. Rocky mountain plot created from log FDR adjusted p-values of beta coefficients. Log p-values from beta estimate > 0 are multiplied by -1.

Forest Plots

micro_forest will create forest plots for a specified covariate from each taxa’s nb_mod or bb_mod. Forest plots display the estimated \(\beta\) coefficients with their 95% confidence intervals.

## Class level models
nb_cla <- micro.filt %>%       ## micro_set
  nb_mods(table = "Class",     ## Rank of taxa we want to model
          bpd1)                ## The covariate in our model
#> 
#> 9 taxa converged
#> 0 taxa did not converge

nb_cla %>%
  micro_forest(bpd1, ## Covariate of interest
               main = "Forest Plot for BPD Severity")

Forest Plot. A forest plot showing the beta estimates and 95% confidence intervals of selected covariate from each taxa model.

Heat Maps

We can also create heat maps of the estimate coefficients from each taxa model to give us a visual of each estimate using micro_heatmap. This might be a helpful tool in assessing patterns in estimated relationships, or for detecting taxa that are strongly associated with our coefficients. We can also specify the number of taxa we wish to show with the top_taxa, as we can with nb_bars and ra_bars. Here top_taxa is selecting the taxa with the largest estimated \(beta\) coefficients. Coefficients with significant FDR adjusted p-values will be be marked in the center. The significance cutoff can be controlled with the alpha argument. The size and shape of this mark can be controlled through the dot_size and dot_shape arguments, respectively.

nb_cla %>% micro_heatmap(top_taxa = 7)

Heat Map of Parameter Estimates. Heat map (or lasagna plot) created from beta estimates of taxa models. ’*’ indicates FDR adjusted p-values < 0.05.

Rank Sum Tests

Rank based tests are fairly common in microbiome analysis. We recommend fitting negative binomial models or beta binomial models when possible as they allow you to control for possible confounders, get meaningful estimates, test a more meaningful hypothesis, and often provide much more power. That said, small sample sizes and zero-inflation can prevent negative binomial and beta binomial models from converging. If you have too small a sample size for any parametric testing or if you’d like to run a test on a non-convergent model, the function micro_rank_sum can be used. The function will run a rank sum test on a categorical variable (Wilcoxon for 2 levels and Kruskal-Wallis for more than 2). If you supply the output from nb_mods or bb_mods, it will run a rank sum test on every non-convergent taxa, otherwise it will run the test on every taxa in the specified table. By default micro_rank_sum will use taxa RA as the response variable, and this is recommended for microbiome rank sum tests. However you can control this using the y argument.

## Kurskal-Wallis on every taxa
micro.filt %>% 
  micro_rank_sum(table = "Family", grp_var = bpd1) %>% 
  knitr::kable()

Taxa	P_Val	FDR_Pval
Actinobacteria/Actinobacteria/Actinomycetales/Actinomycetaceae	0.8818913	0.9122864
Actinobacteria/Actinobacteria/Corynebacteriales/Corynebacteriaceae	0.4524783	0.9122864
Bacteroidetes/Bacteroidia/Bacteroidales/Porphyromonadaceae	0.8854684	0.9122864
Bacteroidetes/Bacteroidia/Bacteroidales/Prevotellaceae	0.5807650	0.9122864
Firmicutes/Bacilli	0.1098463	0.9122864
Firmicutes/Bacilli/Bacillales	0.6074569	0.9122864
Firmicutes/Bacilli/Bacillales/Family-XI-Incertae-Sedis	0.5234559	0.9122864
Firmicutes/Bacilli/Bacillales/Staphylococcaceae	0.6499180	0.9122864
Firmicutes/Bacilli/Lactobacillales/Enterococcaceae	0.6774313	0.9122864
Firmicutes/Bacilli/Lactobacillales/Streptococcaceae	0.6476244	0.9122864
Firmicutes/Negativicutes/Selenomonadales/Veillonellaceae	0.3760036	0.9122864
Fusobacteria/Fusobacteriia/Fusobacteriales/Leptotrichiaceae	0.3587149	0.9122864
Proteobacteria/Betaproteobacteria/Burkholderiales/Comamonadaceae	0.5460038	0.9122864
Proteobacteria/Betaproteobacteria/Neisseriales/Neisseriaceae	0.3051526	0.9122864
Proteobacteria/Gammaproteobacteria	0.8503532	0.9122864
Proteobacteria/Gammaproteobacteria/Enterobacteriales/Enterobacteriaceae	0.6853820	0.9122864
Proteobacteria/Gammaproteobacteria/Pasteurellales/Pasteurellaceae	0.5191910	0.9122864
Proteobacteria/Gammaproteobacteria/Xanthomonadales/Xanthomonadaceae	0.9122864	0.9122864
Tenericutes/Mollicutes/Mycoplasmatales/Mycoplasmataceae	0.3836284	0.9122864
Other	0.7809474	0.9122864

## Kruskal-Wallis on non-convergent taxa
micro.filt %>% 
  micro_rank_sum(table = "Family", grp_var = bpd1, mod = nb_fam) %>% 
  knitr::kable()

Taxa	P_Val	FDR_Pval
Firmicutes/Bacilli	0.1098463	0.2196926
Firmicutes/Bacilli/Lactobacillales/Enterococcaceae	0.6774313	0.6774313

Chi-Squared Tests

An alternative to rank sum tests on taxa RA is running a Chi-Squared test on the presence / absence of the taxa using micro_chisq. The same option to run the test on all taxa or only non-convergent taxa as in micro_rank_sum exists. The test will use presence / absence (bin) as its response variable.

## Chisq on every taxa
micro.filt %>% 
  micro_chisq(table = "Family", grp_var = bpd1, simulate.p.value = T) %>% 
  knitr::kable()

Taxa	Chi_squared	P_Val	FDR_Pval
Actinobacteria/Actinobacteria/Actinomycetales/Actinomycetaceae	2.8663102	0.2463768	0.9253706
Actinobacteria/Actinobacteria/Corynebacteriales/Corynebacteriaceae	2.2649351	0.3463268	0.9253706
Bacteroidetes/Bacteroidia/Bacteroidales/Porphyromonadaceae	0.8344498	0.8085957	0.9410679
Bacteroidetes/Bacteroidia/Bacteroidales/Prevotellaceae	1.0000000	NA	NA
Firmicutes/Bacilli	6.5023569	0.0414793	0.6221889
Firmicutes/Bacilli/Bacillales	0.8545455	0.8155922	0.9410679
Firmicutes/Bacilli/Bacillales/Family-XI-Incertae-Sedis	1.9292929	0.4467766	0.9253706
Firmicutes/Bacilli/Bacillales/Staphylococcaceae	1.0000000	NA	NA
Firmicutes/Bacilli/Lactobacillales/Enterococcaceae	0.9119769	0.7921039	0.9410679
Firmicutes/Bacilli/Lactobacillales/Streptococcaceae	1.0000000	NA	NA
Firmicutes/Negativicutes/Selenomonadales/Veillonellaceae	1.0000000	NA	NA
Fusobacteria/Fusobacteriia/Fusobacteriales/Leptotrichiaceae	1.7391304	0.5552224	0.9253706
Proteobacteria/Betaproteobacteria/Burkholderiales/Comamonadaceae	1.9376987	0.4227886	0.9253706
Proteobacteria/Betaproteobacteria/Neisseriales/Neisseriaceae	3.6363636	0.2093953	0.9253706
Proteobacteria/Gammaproteobacteria	0.1283422	1.0000000	1.0000000
Proteobacteria/Gammaproteobacteria/Enterobacteriales/Enterobacteriaceae	1.0181818	0.6601699	0.9410679
Proteobacteria/Gammaproteobacteria/Pasteurellales/Pasteurellaceae	2.2649351	0.3603198	0.9253706
Proteobacteria/Gammaproteobacteria/Xanthomonadales/Xanthomonadaceae	0.0673401	1.0000000	1.0000000
Tenericutes/Mollicutes/Mycoplasmatales/Mycoplasmataceae	1.7391304	0.5412294	0.9253706
Other	1.0000000	NA	NA

## Chisq on non-convergent taxa
micro.filt %>% 
  micro_chisq(table = "Family", grp_var = bpd1, mod = nb_fam) %>% 
  knitr::kable()

Taxa	Chi_squared	P_Val	FDR_Pval
Firmicutes/Bacilli	6.5023569	0.0387285	0.0774571
Firmicutes/Bacilli/Lactobacillales/Enterococcaceae	0.9119769	0.6338212	0.6338212

Session Info

sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] magrittr_1.5    tidyMicro_1.43  forcats_0.5.0   stringr_1.4.0  
#>  [5] dplyr_0.8.5     purrr_0.3.3     readr_1.3.1     tidyr_1.0.2    
#>  [9] tibble_2.1.3    ggplot2_3.3.0   tidyverse_1.3.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] httr_1.4.1           VGAM_1.1-2           jsonlite_1.6.1      
#>  [4] splines_3.6.1        carData_3.0-3        modelr_0.1.6        
#>  [7] assertthat_0.2.1     highr_0.8            stats4_3.6.1        
#> [10] cellranger_1.1.0     yaml_2.2.1           ggrepel_0.8.2       
#> [13] pillar_1.4.3         backports_1.1.5      lattice_0.20-40     
#> [16] glue_1.3.2           digest_0.6.25        rvest_0.3.5         
#> [19] colorspace_1.4-1     htmltools_0.4.0      Matrix_1.2-18       
#> [22] plyr_1.8.6           pkgconfig_2.0.3      broom_0.5.5         
#> [25] haven_2.2.0          ThreeWay_1.1.3       scales_1.1.0        
#> [28] openxlsx_4.1.4       rio_0.5.16           mgcv_1.8-31         
#> [31] generics_0.0.2       farver_2.0.3         car_3.0-7           
#> [34] ellipsis_0.3.0       withr_2.1.2          cli_2.0.2           
#> [37] crayon_1.3.4         readxl_1.3.1         evaluate_0.14       
#> [40] fs_1.3.2             fansi_0.4.1          nlme_3.1-145        
#> [43] MASS_7.3-51.5        foreign_0.8-76       xml2_1.2.5          
#> [46] vegan_2.5-6          data.table_1.12.8    tools_3.6.1         
#> [49] hms_0.5.3            lifecycle_0.2.0      munsell_0.5.0       
#> [52] reprex_0.3.0         zip_2.0.4            cluster_2.1.0       
#> [55] ade4_1.7-15          compiler_3.6.1       rlang_0.4.5         
#> [58] grid_3.6.1           rstudioapi_0.11      labeling_0.3        
#> [61] rmarkdown_2.1        gtable_0.3.0         abind_1.4-5         
#> [64] curl_4.3             DBI_1.1.0            R6_2.4.1            
#> [67] lubridate_1.7.4      knitr_1.28           latex2exp_0.4.0     
#> [70] permute_0.9-5        stringi_1.4.6        parallel_3.6.1      
#> [73] Rcpp_1.0.4           vctrs_0.2.4          scatterplot3d_0.3-41
#> [76] dbplyr_1.4.2         tidyselect_1.0.0     xfun_0.12

The tidyMicro Pipeline

Installation

tidyMicro

Reading in OTU Counts

tidy_micro

Data Exploration

Taxa Summaries

PCA and PCoA Plots

Principal Components

Principal Coordinates

Longitudinal PCA/PCoA

Three Mode PCA

3D PCA

Bar Charts

Box Plots

Correlation Plots

Heat Maps

Rocky Mountain Plots

Alpha Diversities

Regress on Alpha Diversity

Beta Diversities

Heat Map

PERMANOVA

Taxa Models and Plots

Filtering on a subset

Important Note on Filtering

Modeling

Negative Binomial Models

Beta Binomial Models

Plotting

Stacked Bar Charts

Object Flexibility

Rocky Mountain Plots

Forest Plots

Heat Maps

Rank Sum Tests

Chi-Squared Tests

Session Info