在约束排序分析（RDA、CCA、CAP）中只分析了前十个环境变量

Question

I am analysing microbial communities by constrained ordination (RDA, CCA and CAP) using the tables with environmental variables (soil properties).我正在使用具有环境变量（土壤特性）的表通过约束排序（RDA、CCA 和 CAP）分析微生物群落。

First block was 29 samples and 43 environmental variables.第一个块是 29 个样本和 43 个环境变量。 I used that code:我使用了该代码：

#Prokaryotes
#area

setwd("~/Cadmium/Cd_2022/Cd_R_2022")
Area.prok.spe <- read.delim ('Cadmium_Prok_otutab_area.txt', row.names = 1)
Area.prok.spe <- t(Area.prok.spe)
Area.prok.env <- read.delim ('Area_chem_prok.txt', row.names = 1)

# DCA
DCA <- decorana (log1p (Area.prok.spe))
DCA
# DCA1<3 => linear

#RDA
rda.area.prok <- rda (Area.prok.spe ~ ., data = Area.prok.env)
rda.area.prok
anova (rda.area.prok)
plot(rda.area.prok, type="text", xlim = c(- 5, 5), ylim = c(-10,10))
#No residual component(
ordistep(rda(Area.prok.spe ~ 1, data = Area.prok.env), scope=formula(rda.area.prok), direction="forward", pstep=1000)
ordistep.prok.A <- ordistep(rda(Area.prok.spe ~ 1, data = Area.prok.env), scope=formula(rda.area.prok), direction = "both", Pin = 0.05, Pout = 0.1, permutations = how(nperm = 999), steps = 50, trace = TRUE)
# look at the significant variables
ordistep.prok.A$anova
plot(ordistep.prok.A, type="text")
#Now we can calculate variations explained by individual fractions (using varpart function): 
varp <- varpart (Area.prok.spe,  ~ Feox, ~ Cat, ~ Cd, ~ Cdt, data = Area.prok.env)
varp
plot (varp, digits = 2, Xnames = c('Feox', 'Cat', 'Cd(CaCl2)', 'Cdt'), bg = c('navy', 'tomato', 'yellow', 'green'), cutoff = -1)

#CCA
Cd_cca_area <- cca(Area.prok.spe ~ ., Area.prok.env)
Cd_cca_area
anova.cca(Cd_cca_area, step=1000)
plot(Cd_cca_area, type="text")
ordistep.prok.A2 <- ordistep(cca(Area.prok.spe ~ 1, data=Area.prok.env), scope=formula(Area.prok.env), direction="forward", pstep=1000)
plot(ordistep.prok.A2, type="text")
ordistep.prok.A2$anova
varp.cca <- varpart (Area.prok.spe,  ~ Cdt, ~ K, ~ Be, ~ Cox, data = Area.prok.env)
varp.cca
plot (varp.cca, digits = 2, Xnames = c('Cdt', 'K (CaCl2)', 'Be(CaCl2)', 'Cox'), bg = c('navy', 'tomato', 'yellow', 'green'), cutoff = -1)


#CAP
cap_area <- capscale(Area.prok.spe ~ ., Area.prok.env, dist="bray")
Cd_cap_area
anova(Cd_cap_area)
plot(cap_area, type="text")
ordistep.prok.A3 <- ordistep(capscale(Area.prok.spe ~ 1, data=Area.prok.env), scope=formula(Cd_cap_area), direction="forward", pstep=1000)
ordistep.prok.A3$anova
plot(ordistep.prok.A3, type="text")
#anova(Cd_cap_area, by="axis", step=1000)
#anova(Cd_cap_area, by="terms", step=1000)
plot(capscale(Area.prok.spe ~ ., Area.prok.env, dist="bray"), type="text")

It worked fine, except that I always had 28 constrained axes and 0 unconstrained, so the ANOVA was not possible due to no residuals.它工作得很好，除了我总是有 28 个受约束的轴和 0 个不受约束的轴，所以由于没有残差，方差分析是不可能的。

Then I used the very same code to analyse the second block of the data with 31 samples and 55 environmental variables.然后我使用相同的代码来分析包含 31 个样本和 55 个环境变量的数据的第二块。

#Prokaryotes
#Profile A

setwd("~/Cadmium/Cd_2022/Cd_R_2022")
A.prok.spe <- read.delim ('Cadmium_Prok_otutab_A.txt', row.names = 1)
A.prok.spe <- t(A.prok.spe)
A.prok.env <- read.delim ('Cd_chem_A3.txt', row.names = 1)

# DCA
DCA <- decorana (log1p (A.prok.spe))
DCA
# DCA1<3 => linear

#RDA
rda.all <- rda (A.prok.spe ~ ., data = A.prok.env)
rda.all
anova (rda.all, step=1000)


ordistep.prok.A <- ordistep(rda(A.prok.spe ~ 1, data = A.prok.env), scope=formula(rda.all), direction="forward", pstep=1000)
# look at the significant variables
ordistep.prok.A$anova
plot(ordistep.prok.A, type="text")
#Now we can calculate variations explained by individual fractions (using varpart function): 
varp <- varpart (A.prok.spe,  ~ Feox, ~ Cat, ~ Cd, ~ Cdt, data = A.prok.env)
varp
plot (varp, digits = 2, Xnames = c('Feox', 'Cat', 'Cd(CaCl2)', 'Cdt'), bg = c('navy', 'tomato', 'yellow', 'green'), cutoff = -1)
plot(rda.all, type="text")

#CCA
Cd_cca_prokA <- cca(A.prok.spe ~ ., A.prok.env)
Cd_cca_prokA
anova.cca(Cd_cca_prokA, step=1000)
ordistep.prok.A2 <- ordistep(cca(A.prok.spe ~ 1, data=A.prok.env), scope=formula(A.prok.env), direction="forward", pstep=1000)
plot(ordistep.prok.A2, type="text")
plot(Cd_cca_prokA, type="text")
ordistep.prok.A2$anova
varp.cca <- varpart (A.prok.spe,  ~ Nit, ~ Crt, ~ VWC, ~ Cu + Cut, data = A.prok.env)
varp.cca
plot (varp.cca, digits = 2, Xnames = c('Nit', 'Crt', 'VWC', 'Cut + Cu(CaCl2)'), bg = c('navy', 'tomato', 'yellow', 'green'), cutoff = -1)

#CAP
Cd_cap_A <- capscale(A.prok.spe ~ ., A.prok.env, dist="bray")
Cd_cap_A
anova(Cd_cap_A)
ordistep.prok.A3 <- ordistep(capscale(A.prok.spe ~ 1, data=A.prok.env, dist="bray"), scope=formula(Cd_cap_A), direction="forward", pstep=1000)
ordistep.prok.A3$anova
plot(ordistep.prok.A3, type="text")
plot(Cd_cap_A, type="text")
anova(Cd_cap_A, by="axis", step=1000)
anova_cap_A_terms <- anova(Cd_cap_A, by="terms", step=1000)

I noticed that when plotting results, eg plot(Cd_cca_prokA), it showed vectors for only first 10 variables.我注意到在绘制结果时，例如 plot(Cd_cca_prokA)，它只显示了前 10 个变量的向量。 In the first block, when I run plot(Cd_cca_area) it showed all the variables as the vectors on the biplot.在第一个块中，当我运行 plot(Cd_cca_area) 时，它会将所有变量显示为双标图上的向量。
When I run anova(Cd_cap_A, by="terms", step=1000) or anova(Cd_cca_prokA, by="terms", step=1000) it returned results only for the first 10 variables.当我运行anova(Cd_cap_A, by="terms", step=1000)或anova(Cd_cca_prokA, by="terms", step=1000)它只返回前 10 个变量的结果。

> anova(Cd_cca_prokA, by="terms", step=1000)
Permutation test for cca under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999

Model: cca(formula = A.prok.spe ~ Alt + Ast + Bat + Cat + Cdt + Cot + Crt + Cut + Fet + Kt + Mgt + Mnt + Nat + Nit + Pbt + St + Sit + Tit + Znt + pH..H2O. + m.Cox. + Consumption.Cox + Cox + pH..BaCl2. + CEC + BS + Al.ox + Fe.ox + Mn.ox + Cd.ox + Al + Ba + Be + Cd + Cu + Fe + K + Mg + Mn + Na + Ni + Pb + Zn + DOC + PD + BD + TP + CP + SP + NP + VWC + GWC + RWC + VWHC + RWHC, data = A.prok.env)
         Df ChiSquare      F Pr(>F)  
Alt       1   0.11697 0.9187  0.659  
Ast       1   0.11982 0.9411  0.557  
Bat       1   0.14516 1.1401  0.170  
Cat       1   0.22292 1.7508  0.023 *
Cdt       1   0.12798 1.0052  0.398  
Cot       1   0.18957 1.4889  0.019 *
Crt       1   0.10791 0.8476  0.848  
Cut       1   0.10150 0.7972  0.889  
Fet       1   0.13003 1.0213  0.390  
Kt        1   0.10646 0.8361  0.887  
Residual 20   2.54640                
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Finally, the results of all 3 analyses were: 10 constrained and 20 unconstrained axes.最后，所有 3 个分析的结果是：10 个约束轴和 20 个无约束轴。

I started to suspect that only the first 10 environmental variables were used and the other 45 were just discarded!我开始怀疑只使用了前 10 个环境变量，而其他 45 个被丢弃了！

What happened?发生了什么？ I did all the same that with the first block.我对第一个街区做了同样的事情。 How to force R to use all the environmental variables?如何强制 R 使用所有环境变量？

Please help me.请帮我。

Thank you.谢谢你。

Answer 1

You do not give a reproducible case, but if you really have 31 sampling units (observations) and 55 predictors, you are overfitting your data.您没有给出可重现的案例，但如果您确实有 31 个采样单元（观察值）和 55 个预测变量，那么您的数据就会过拟合。 You cannot have more predictors than observations – or you can, but 31 random variables will completely explain 31 observations.你不能有比观察更多的预测变量——或者你可以，但是 31 个随机变量将完全解释 31 个观察。 Probably the problem is the same with your "only 10 first variables": these were enough to predict exactly your observations and the later ones were dumped (we call that "aliasing").问题可能与您的“只有 10 个第一个变量”相同：这些足以准确预测您的观察结果，而后来的观察结果被丢弃（我们称之为“混叠”）。 As a summary: the number of predictor variables cannot be higher than the number of observations.总结一下：预测变量的数量不能高于观察的数量。 You need either more data or you need to reduce the number of your predictors.您需要更多数据，或者需要减少预测变量的数量。

在约束排序分析（RDA、CCA、CAP）中只分析了前十个环境变量

问题描述

1 个解决方案

解决方案1
0 2022-09-03 20:48:37

在约束排序分析（RDA、CCA、CAP）中只分析了前十个环境变量

问题描述

1 个解决方案

解决方案1 0 2022-09-03 20:48:37

解决方案1
0 2022-09-03 20:48:37