[英]stepcAIC - Error in eval(predvars, data, env) : object 'Color1' not found
I want to select the optimal random structure for my mixed effects model (fitted with lmer()
from lme4
). 我想为我的混合效果模型(与lmer()
lme4
)选择最佳的随机结构。 I found the function stepcAIC()
from the package cAIC4
, which is supposed to compare models and select the one with the smallest AIC in a stepwise fashion. 我从软件包cAIC4
找到了函数stepcAIC()
,该函数应该比较模型并逐步选择具有最小AIC的模型。 Although the implementation looks very simple, I get an error. 尽管实现看起来很简单,但是我得到了一个错误。
After fitting my model, I ran the following function: 拟合模型后,我运行以下功能:
stepcAIC(model_full, direction="backward")
So first - it takes forever to run. 所以首先-它需要永远运行。 Second - I get an error message. 第二-我收到一条错误消息。 I tried explicitly specifying the dataset: 我尝试显式指定数据集:
stepcAIC(model_full, direction="backward", data=data_correct)
I also tried to update R to the newest version and then ran it again, but it doesn't help. 我还尝试将R更新到最新版本,然后再次运行它,但这无济于事。
Does anyone have a positive experience with this function to tell me what I did wrong? 有人对这个功能有积极的经验告诉我我做错了什么吗?
The error I get is this: 我得到的错误是这样的:
Error in eval(predvars, data, env) : object 'Color1' not found eval(predvars,data,env)中的错误:找不到对象'Color1'
I have a variable named "Color", but not "Color1". 我有一个名为“ Color”的变量,但没有“ Color1”。 Perhaps "Color1" is a name taken from the table of the effects, but then why would it use the name from the summary table and search for it in the data frame? 也许“ Color1”是从效果表中获取的名称,但是为什么它要使用汇总表中的名称并在数据框中搜索呢?
I also get warnings: 我也收到警告:
In if (!hasInt(resForThisGroup)) res[[i]] <- res[[i]][-j] : the condition has length > 1 and only the first element will be used 如果if(!hasInt(resForThisGroup))res [[i]] <-res [[i]] [-j]:条件的长度> 1,则仅使用第一个元素
Here is a [link]( https://drive.google.com/open?id=1jIJn2rzK3SwpKMfKGDhseYcOxinuwpue ) to download data_correct
and model_full
: 这是[link]( https://drive.google.com/open?id=1jIJn2rzK3SwpKMfKGDhseYcOxinuwpue ),用于下载data_correct
和model_full
:
This is how I created model_full
: 这就是我创建model_full
:
model_full <- lmer(data=data_correct, log_RT~Polarity+Delay+Truth_value+Type+Color+Order + Polarity:Delay + Polarity:Truth_value + Polarity:Order + Polarity:Type+ Polarity:Color + Delay:Truth_value+ Truth_value:Delay:Polarity + (1+Polarity*Color+Delay+Delay:Polarity+Truth_value|Subject), control=lmerControl(optimizer="bobyqa"), REML=FALSE)
This is the output of model_full
: 这是model_full
的输出:
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: log_RT ~ Polarity + Delay + Truth_value + Type + Color + Order +
Polarity:Delay + Polarity:Truth_value + Polarity:Order +
Polarity:Type + Polarity:Color + Delay:Truth_value + Truth_value:Delay:Polarity +
(1 + Polarity * Color + Delay + Delay:Polarity + Truth_value | Subject)
Data: data_correct
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
16556.6 16896.2 -8235.3 16470.6 19838
Scaled residuals:
Min 1Q Median 3Q Max
-3.9078 -0.6585 -0.1065 0.5654 6.5045
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 0.0652479 0.25544
Polarity1 0.0045472 0.06743 0.51
Color1 0.0030415 0.05515 0.15 0.13
Delay1 0.0005240 0.02289 0.22 -0.05 -0.02
Truth_value1 0.0022027 0.04693 0.00 0.48 0.23 0.00
Polarity1:Color1 0.0003927 0.01982 0.04 -0.33 0.57 -0.50 -0.12
Polarity1:Delay1 0.0001981 0.01408 0.61 0.07 0.06 0.55 0.06 -0.04
Residual 0.1304137 0.36113
Number of obs: 19881, groups: Subject, 38
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 6.572e+00 4.152e-02 3.800e+01 158.301 < 2e-16 ***
Polarity1 1.234e-01 1.124e-02 3.797e+01 10.985 2.38e-13 ***
Delay1 -6.476e-02 4.512e-03 3.817e+01 -14.352 < 2e-16 ***
Truth_value1 5.266e-02 8.034e-03 3.805e+01 6.556 9.83e-08 ***
Type1 7.531e-03 2.562e-03 1.962e+04 2.939 0.003292 **
Color1 2.512e-02 9.308e-03 3.756e+01 2.698 0.010379 *
Order1 -3.524e-02 8.981e-03 3.794e+01 -3.924 0.000354 ***
Polarity1:Delay1 -2.244e-02 3.433e-03 3.834e+01 -6.538 1.00e-07 ***
Polarity1:Truth_value1 -5.728e-02 2.563e-03 1.963e+04 -22.347 < 2e-16 ***
Polarity1:Order1 -1.250e-02 3.547e-03 3.823e+01 -3.525 0.001119 **
Polarity1:Type1 -7.107e-03 2.562e-03 1.962e+04 -2.774 0.005544 **
Polarity1:Color1 4.012e-03 4.114e-03 3.790e+01 0.975 0.335639
Delay1:Truth_value1 5.301e-03 2.563e-03 1.963e+04 2.068 0.038629 *
Polarity1:Delay1:Truth_value1 9.625e-03 2.563e-03 1.963e+04 3.755 0.000174 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Only sort-of an answer; will delete later if appropriate.) (仅对答案进行排序;如果合适,稍后将删除。)
I can't replicate your problem because your data set is too big for the machine I'm working on at the moment; 我无法复制您的问题,因为您的数据集对于目前正在使用的计算机而言太大; when I try to run stepcAIC(model_full, direction="backward")
I get: 当我尝试运行stepcAIC(model_full, direction="backward")
我得到:
The cAIC of the initial model can not be calculated. 无法计算初始模型的cAIC。
which is explained by the message from cAIC(model_full)
: 由cAIC(model_full)
的消息解释:
Error: cannot allocate vector of size 2.9 Gb 错误:无法分配大小为2.9 Gb的向量
This is perhaps not surprising, as the model is moderately large (~20K observations, 28 parameters). 这也许不足为奇,因为该模型适中(约20K观测值,28个参数)。 (Digging into the code, we can see that the model is trying to construct a dense identity matrix with dimensions equal to the number of observations - in this case n * n * 8 bytes
is nearly 3 Gb ...) (深入研究代码,我们可以看到该模型正在尝试构建一个尺寸等于观察次数的密集单位矩阵,在这种情况下, n * n * 8 bytes
接近3 Gb ...)
Computing cAIC is really only necessary if you want to select models on the basis of individual-level predictions; 仅当您要根据个人水平的预测选择模型时,才需要计算cAIC。 if you want to select on the basis of population-level predictions, AIC should be acceptable (and is computationally much cheaper). 如果要基于人口水平的预测进行选择,则AIC应该可以接受(并且计算上便宜得多)。 The simplest selection procedure is based on p-values (I don't like it because I don't think modeling decisions should be based on significance testing, but lots of people use it). 最简单的选择过程基于p值(我不喜欢它,因为我认为建模决策不应该基于重要性测试,但是很多人使用它)。
The step()
function in lmerTest
will do p-value based backward selection: lmerTest
的step()
函数将基于p值进行向后选择:
system.time(ss <- step(model_full,reduce.fixed=FALSE))
takes about 4.5 minutes on my old laptop. 我的旧笔记本电脑大约需要4.5分钟。 The result (abbreviated) is that it tests the effect of dropping Truth_value
, Polarity:Color
, and Polarity:Delay
from the random effects, and concludes that it shouldn't drop any of them. 结果(略)是它测试了从随机效果中删除Truth_value
, Polarity:Color
和Polarity:Delay
的效果,并得出结论,它不应删除其中的任何一个。
Backward reduced random-effect table:
Eliminated npar logLik AIC LRT Df Pr(>Chisq)
<none> 43 -8235.3 16557
T_i(1+P*C+D+D:P+T_|S 0 36 -8366.3 16804 261.915 7 < 2.2e-16 ***
P:Ci(1+P*C+D+D:P+T|S 0 36 -8257.1 16586 43.693 7 2.451e-07 ***
P:Di(1+P*C+D+D:P+T|S 0 36 -8245.0 16562 19.507 7 0.006739 **
---
?step.lmerModLmerTest
... a column '"Eliminated"' indicating the order in which terms are eliminated from the model with zero ('0') indicating that the term is not eliminated from the model. ...列“ Eliminated”(“已消除”)表示从模型中消除术语的顺序,零(“ 0”)表示未从模型中消除术语。
In this case the step()
function has tried to drop all of the highest-order terms (two-way interactions + main effect of Truth_value
, which isn't involved in an interaction), and found that it doesn't want to drop any of them. 在这种情况下, step()
函数尝试删除所有最高阶的项(双向交互+ Truth_value
主要作用,该交互未涉及),并且发现它不想删除任何一位。 In this case the p-value criteria (all terms have p<0.05) and the AIC criteria (all reduced models have AIC larger than the original model) agree with each other. 在这种情况下,p值标准(所有项的p <0.05)和AIC标准(所有精简模型的AIC都大于原始模型)相互一致。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.