I would like to know if my approach to calculate the average of the relative abundance of any taxon is correct !!!
If I want to know if, to calculate the relative abundance (percent) of each family (or any Taxon) in a phyloseq object (GlobalPattern) will be correct like:
data("GlobalPatterns")
T <- GlobalPatterns %>%
tax_glom(., "Family") %>%
transform_sample_counts(function(x)100* x / sum(x)) %>% psmelt() %>%
arrange(OTU) %>% rename(OTUsID = OTU) %>%
select(OTUsID, Family, Sample, Abundance) %>%
spread(Sample, Abundance)
T$Mean <- rowMeans(T[, c(3:ncol(T))])
FAM <- T[, c("Family", "Mean" ) ]
#order data frame
FAM <- FAM[order(dplyr::desc(FAM$Mean)),]
rownames(FAM) <- NULL
head(FAM)
Family Mean
1 Bacteroidaceae 7.490944
2 Ruminococcaceae 6.038956
3 Lachnospiraceae 5.758200
4 Flavobacteriaceae 5.016402
5 Desulfobulbaceae 3.341026
6 ACK-M1 3.242808
in this case the Bacteroidaceae were the most abundant family in all the samples of GlobalPattern (26 samples and 19216 OTUs), it was present in 7.49% in average in 26 samples !!!! It's correct to make the T$Mean <- rowMeans(T[, c(3:ncol(T))]) to calculate the average any given Taxon ?
Bacteroidaceae has the highest abundance, if all samples were pooled together. However, it has the highest abundance in only 2 samples. Nevertheless, there is no other taxon having a higher abundance in an average sample.
Let's use dplyr verbs for all the steps to have a more descriptive and consistent code:
library(tidyverse)
library(phyloseq)
#> Creating a generic function for 'nrow' from package 'base' in package 'biomformat'
#> Creating a generic function for 'ncol' from package 'base' in package 'biomformat'
#> Creating a generic function for 'rownames' from package 'base' in package 'biomformat'
#> Creating a generic function for 'colnames' from package 'base' in package 'biomformat'
data(GlobalPatterns)
data <-
GlobalPatterns %>%
tax_glom("Family") %>%
transform_sample_counts(function(x)100* x / sum(x)) %>%
psmelt() %>%
as_tibble()
# highest abundance: all samples pooled together
data %>%
group_by(Family) %>%
summarise(Abundance = mean(Abundance)) %>%
arrange(-Abundance)
#> # A tibble: 334 × 2
#> Family Abundance
#> <chr> <dbl>
#> 1 Bacteroidaceae 7.49
#> 2 Ruminococcaceae 6.04
#> 3 Lachnospiraceae 5.76
#> 4 Flavobacteriaceae 5.02
#> 5 Desulfobulbaceae 3.34
#> 6 ACK-M1 3.24
#> 7 Streptococcaceae 2.77
#> 8 Nostocaceae 2.62
#> 9 Enterobacteriaceae 2.55
#> 10 Spartobacteriaceae 2.45
#> # … with 324 more rows
# sanity check: is total abundance of each sample 100%?
data %>%
group_by(Sample) %>%
summarise(Abundance = sum(Abundance)) %>%
pull(Abundance) %>%
`==`(100) %>%
all()
#> [1] TRUE
# get most abundant family for each sample individually
data %>%
group_by(Sample) %>%
arrange(-Abundance) %>%
slice(1) %>%
select(Family) %>%
ungroup() %>%
count(Family, name = "n_samples") %>%
arrange(-n_samples)
#> Adding missing grouping variables: `Sample`
#> # A tibble: 18 × 2
#> Family n_samples
#> <chr> <int>
#> 1 Desulfobulbaceae 3
#> 2 Bacteroidaceae 2
#> 3 Crenotrichaceae 2
#> 4 Flavobacteriaceae 2
#> 5 Lachnospiraceae 2
#> 6 Ruminococcaceae 2
#> 7 Streptococcaceae 2
#> 8 ACK-M1 1
#> 9 Enterobacteriaceae 1
#> 10 Moraxellaceae 1
#> 11 Neisseriaceae 1
#> 12 Nostocaceae 1
#> 13 Solibacteraceae 1
#> 14 Spartobacteriaceae 1
#> 15 Sphingomonadaceae 1
#> 16 Synechococcaceae 1
#> 17 Veillonellaceae 1
#> 18 Verrucomicrobiaceae 1
Created on 2022-06-10 by the reprex package (v2.0.0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.