[英]Only first ten environmental variables are analysed in constrained ordination analyses (RDA, CCA, CAP)
I am analysing microbial communities by constrained ordination (RDA, CCA and CAP) using the tables with environmental variables (soil properties).我正在使用具有环境变量(土壤特性)的表通过约束排序(RDA、CCA 和 CAP)分析微生物群落。
First block was 29 samples and 43 environmental variables.第一个块是 29 个样本和 43 个环境变量。 I used that code:我使用了该代码:
#Prokaryotes
#area
setwd("~/Cadmium/Cd_2022/Cd_R_2022")
Area.prok.spe <- read.delim ('Cadmium_Prok_otutab_area.txt', row.names = 1)
Area.prok.spe <- t(Area.prok.spe)
Area.prok.env <- read.delim ('Area_chem_prok.txt', row.names = 1)
# DCA
DCA <- decorana (log1p (Area.prok.spe))
DCA
# DCA1<3 => linear
#RDA
rda.area.prok <- rda (Area.prok.spe ~ ., data = Area.prok.env)
rda.area.prok
anova (rda.area.prok)
plot(rda.area.prok, type="text", xlim = c(- 5, 5), ylim = c(-10,10))
#No residual component(
ordistep(rda(Area.prok.spe ~ 1, data = Area.prok.env), scope=formula(rda.area.prok), direction="forward", pstep=1000)
ordistep.prok.A <- ordistep(rda(Area.prok.spe ~ 1, data = Area.prok.env), scope=formula(rda.area.prok), direction = "both", Pin = 0.05, Pout = 0.1, permutations = how(nperm = 999), steps = 50, trace = TRUE)
# look at the significant variables
ordistep.prok.A$anova
plot(ordistep.prok.A, type="text")
#Now we can calculate variations explained by individual fractions (using varpart function):
varp <- varpart (Area.prok.spe, ~ Feox, ~ Cat, ~ Cd, ~ Cdt, data = Area.prok.env)
varp
plot (varp, digits = 2, Xnames = c('Feox', 'Cat', 'Cd(CaCl2)', 'Cdt'), bg = c('navy', 'tomato', 'yellow', 'green'), cutoff = -1)
#CCA
Cd_cca_area <- cca(Area.prok.spe ~ ., Area.prok.env)
Cd_cca_area
anova.cca(Cd_cca_area, step=1000)
plot(Cd_cca_area, type="text")
ordistep.prok.A2 <- ordistep(cca(Area.prok.spe ~ 1, data=Area.prok.env), scope=formula(Area.prok.env), direction="forward", pstep=1000)
plot(ordistep.prok.A2, type="text")
ordistep.prok.A2$anova
varp.cca <- varpart (Area.prok.spe, ~ Cdt, ~ K, ~ Be, ~ Cox, data = Area.prok.env)
varp.cca
plot (varp.cca, digits = 2, Xnames = c('Cdt', 'K (CaCl2)', 'Be(CaCl2)', 'Cox'), bg = c('navy', 'tomato', 'yellow', 'green'), cutoff = -1)
#CAP
cap_area <- capscale(Area.prok.spe ~ ., Area.prok.env, dist="bray")
Cd_cap_area
anova(Cd_cap_area)
plot(cap_area, type="text")
ordistep.prok.A3 <- ordistep(capscale(Area.prok.spe ~ 1, data=Area.prok.env), scope=formula(Cd_cap_area), direction="forward", pstep=1000)
ordistep.prok.A3$anova
plot(ordistep.prok.A3, type="text")
#anova(Cd_cap_area, by="axis", step=1000)
#anova(Cd_cap_area, by="terms", step=1000)
plot(capscale(Area.prok.spe ~ ., Area.prok.env, dist="bray"), type="text")
It worked fine, except that I always had 28 constrained axes and 0 unconstrained, so the ANOVA was not possible due to no residuals.它工作得很好,除了我总是有 28 个受约束的轴和 0 个不受约束的轴,所以由于没有残差,方差分析是不可能的。
Then I used the very same code to analyse the second block of the data with 31 samples and 55 environmental variables.然后我使用相同的代码来分析包含 31 个样本和 55 个环境变量的数据的第二块。
#Prokaryotes
#Profile A
setwd("~/Cadmium/Cd_2022/Cd_R_2022")
A.prok.spe <- read.delim ('Cadmium_Prok_otutab_A.txt', row.names = 1)
A.prok.spe <- t(A.prok.spe)
A.prok.env <- read.delim ('Cd_chem_A3.txt', row.names = 1)
# DCA
DCA <- decorana (log1p (A.prok.spe))
DCA
# DCA1<3 => linear
#RDA
rda.all <- rda (A.prok.spe ~ ., data = A.prok.env)
rda.all
anova (rda.all, step=1000)
ordistep.prok.A <- ordistep(rda(A.prok.spe ~ 1, data = A.prok.env), scope=formula(rda.all), direction="forward", pstep=1000)
# look at the significant variables
ordistep.prok.A$anova
plot(ordistep.prok.A, type="text")
#Now we can calculate variations explained by individual fractions (using varpart function):
varp <- varpart (A.prok.spe, ~ Feox, ~ Cat, ~ Cd, ~ Cdt, data = A.prok.env)
varp
plot (varp, digits = 2, Xnames = c('Feox', 'Cat', 'Cd(CaCl2)', 'Cdt'), bg = c('navy', 'tomato', 'yellow', 'green'), cutoff = -1)
plot(rda.all, type="text")
#CCA
Cd_cca_prokA <- cca(A.prok.spe ~ ., A.prok.env)
Cd_cca_prokA
anova.cca(Cd_cca_prokA, step=1000)
ordistep.prok.A2 <- ordistep(cca(A.prok.spe ~ 1, data=A.prok.env), scope=formula(A.prok.env), direction="forward", pstep=1000)
plot(ordistep.prok.A2, type="text")
plot(Cd_cca_prokA, type="text")
ordistep.prok.A2$anova
varp.cca <- varpart (A.prok.spe, ~ Nit, ~ Crt, ~ VWC, ~ Cu + Cut, data = A.prok.env)
varp.cca
plot (varp.cca, digits = 2, Xnames = c('Nit', 'Crt', 'VWC', 'Cut + Cu(CaCl2)'), bg = c('navy', 'tomato', 'yellow', 'green'), cutoff = -1)
#CAP
Cd_cap_A <- capscale(A.prok.spe ~ ., A.prok.env, dist="bray")
Cd_cap_A
anova(Cd_cap_A)
ordistep.prok.A3 <- ordistep(capscale(A.prok.spe ~ 1, data=A.prok.env, dist="bray"), scope=formula(Cd_cap_A), direction="forward", pstep=1000)
ordistep.prok.A3$anova
plot(ordistep.prok.A3, type="text")
plot(Cd_cap_A, type="text")
anova(Cd_cap_A, by="axis", step=1000)
anova_cap_A_terms <- anova(Cd_cap_A, by="terms", step=1000)
anova(Cd_cap_A, by="terms", step=1000)
or anova(Cd_cca_prokA, by="terms", step=1000)
it returned results only for the first 10 variables.当我运行anova(Cd_cap_A, by="terms", step=1000)
或anova(Cd_cca_prokA, by="terms", step=1000)
它只返回前 10 个变量的结果。> anova(Cd_cca_prokA, by="terms", step=1000)
Permutation test for cca under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
Model: cca(formula = A.prok.spe ~ Alt + Ast + Bat + Cat + Cdt + Cot + Crt + Cut + Fet + Kt + Mgt + Mnt + Nat + Nit + Pbt + St + Sit + Tit + Znt + pH..H2O. + m.Cox. + Consumption.Cox + Cox + pH..BaCl2. + CEC + BS + Al.ox + Fe.ox + Mn.ox + Cd.ox + Al + Ba + Be + Cd + Cu + Fe + K + Mg + Mn + Na + Ni + Pb + Zn + DOC + PD + BD + TP + CP + SP + NP + VWC + GWC + RWC + VWHC + RWHC, data = A.prok.env)
Df ChiSquare F Pr(>F)
Alt 1 0.11697 0.9187 0.659
Ast 1 0.11982 0.9411 0.557
Bat 1 0.14516 1.1401 0.170
Cat 1 0.22292 1.7508 0.023 *
Cdt 1 0.12798 1.0052 0.398
Cot 1 0.18957 1.4889 0.019 *
Crt 1 0.10791 0.8476 0.848
Cut 1 0.10150 0.7972 0.889
Fet 1 0.13003 1.0213 0.390
Kt 1 0.10646 0.8361 0.887
Residual 20 2.54640
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I started to suspect that only the first 10 environmental variables were used and the other 45 were just discarded!我开始怀疑只使用了前 10 个环境变量,而其他 45 个被丢弃了!
What happened?发生了什么? I did all the same that with the first block.我对第一个街区做了同样的事情。 How to force R to use all the environmental variables?如何强制 R 使用所有环境变量?
Please help me.请帮我。
Thank you.谢谢你。
You do not give a reproducible case, but if you really have 31 sampling units (observations) and 55 predictors, you are overfitting your data.您没有给出可重现的案例,但如果您确实有 31 个采样单元(观察值)和 55 个预测变量,那么您的数据就会过拟合。 You cannot have more predictors than observations – or you can, but 31 random variables will completely explain 31 observations.你不能有比观察更多的预测变量——或者你可以,但是 31 个随机变量将完全解释 31 个观察。 Probably the problem is the same with your "only 10 first variables": these were enough to predict exactly your observations and the later ones were dumped (we call that "aliasing").问题可能与您的“只有 10 个第一个变量”相同:这些足以准确预测您的观察结果,而后来的观察结果被丢弃(我们称之为“混叠”)。 As a summary: the number of predictor variables cannot be higher than the number of observations.总结一下:预测变量的数量不能高于观察的数量。 You need either more data or you need to reduce the number of your predictors.您需要更多数据,或者需要减少预测变量的数量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.