I want to select the optimal random structure for my mixed effects model (fitted with lmer()
from lme4
). I found the function stepcAIC()
from the package cAIC4
, which is supposed to compare models and select the one with the smallest AIC in a stepwise fashion. Although the implementation looks very simple, I get an error.
After fitting my model, I ran the following function:
stepcAIC(model_full, direction="backward")
So first - it takes forever to run. Second - I get an error message. I tried explicitly specifying the dataset:
stepcAIC(model_full, direction="backward", data=data_correct)
I also tried to update R to the newest version and then ran it again, but it doesn't help.
Does anyone have a positive experience with this function to tell me what I did wrong?
The error I get is this:
Error in eval(predvars, data, env) : object 'Color1' not found
I have a variable named "Color", but not "Color1". Perhaps "Color1" is a name taken from the table of the effects, but then why would it use the name from the summary table and search for it in the data frame?
I also get warnings:
In if (!hasInt(resForThisGroup)) res[[i]] <- res[[i]][-j] : the condition has length > 1 and only the first element will be used
Here is a [link]( https://drive.google.com/open?id=1jIJn2rzK3SwpKMfKGDhseYcOxinuwpue ) to download data_correct
and model_full
:
This is how I created model_full
:
model_full <- lmer(data=data_correct, log_RT~Polarity+Delay+Truth_value+Type+Color+Order + Polarity:Delay + Polarity:Truth_value + Polarity:Order + Polarity:Type+ Polarity:Color + Delay:Truth_value+ Truth_value:Delay:Polarity + (1+Polarity*Color+Delay+Delay:Polarity+Truth_value|Subject), control=lmerControl(optimizer="bobyqa"), REML=FALSE)
This is the output of model_full
:
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: log_RT ~ Polarity + Delay + Truth_value + Type + Color + Order +
Polarity:Delay + Polarity:Truth_value + Polarity:Order +
Polarity:Type + Polarity:Color + Delay:Truth_value + Truth_value:Delay:Polarity +
(1 + Polarity * Color + Delay + Delay:Polarity + Truth_value | Subject)
Data: data_correct
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
16556.6 16896.2 -8235.3 16470.6 19838
Scaled residuals:
Min 1Q Median 3Q Max
-3.9078 -0.6585 -0.1065 0.5654 6.5045
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 0.0652479 0.25544
Polarity1 0.0045472 0.06743 0.51
Color1 0.0030415 0.05515 0.15 0.13
Delay1 0.0005240 0.02289 0.22 -0.05 -0.02
Truth_value1 0.0022027 0.04693 0.00 0.48 0.23 0.00
Polarity1:Color1 0.0003927 0.01982 0.04 -0.33 0.57 -0.50 -0.12
Polarity1:Delay1 0.0001981 0.01408 0.61 0.07 0.06 0.55 0.06 -0.04
Residual 0.1304137 0.36113
Number of obs: 19881, groups: Subject, 38
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 6.572e+00 4.152e-02 3.800e+01 158.301 < 2e-16 ***
Polarity1 1.234e-01 1.124e-02 3.797e+01 10.985 2.38e-13 ***
Delay1 -6.476e-02 4.512e-03 3.817e+01 -14.352 < 2e-16 ***
Truth_value1 5.266e-02 8.034e-03 3.805e+01 6.556 9.83e-08 ***
Type1 7.531e-03 2.562e-03 1.962e+04 2.939 0.003292 **
Color1 2.512e-02 9.308e-03 3.756e+01 2.698 0.010379 *
Order1 -3.524e-02 8.981e-03 3.794e+01 -3.924 0.000354 ***
Polarity1:Delay1 -2.244e-02 3.433e-03 3.834e+01 -6.538 1.00e-07 ***
Polarity1:Truth_value1 -5.728e-02 2.563e-03 1.963e+04 -22.347 < 2e-16 ***
Polarity1:Order1 -1.250e-02 3.547e-03 3.823e+01 -3.525 0.001119 **
Polarity1:Type1 -7.107e-03 2.562e-03 1.962e+04 -2.774 0.005544 **
Polarity1:Color1 4.012e-03 4.114e-03 3.790e+01 0.975 0.335639
Delay1:Truth_value1 5.301e-03 2.563e-03 1.963e+04 2.068 0.038629 *
Polarity1:Delay1:Truth_value1 9.625e-03 2.563e-03 1.963e+04 3.755 0.000174 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Only sort-of an answer; will delete later if appropriate.)
I can't replicate your problem because your data set is too big for the machine I'm working on at the moment; when I try to run stepcAIC(model_full, direction="backward")
I get:
The cAIC of the initial model can not be calculated.
which is explained by the message from cAIC(model_full)
:
Error: cannot allocate vector of size 2.9 Gb
This is perhaps not surprising, as the model is moderately large (~20K observations, 28 parameters). (Digging into the code, we can see that the model is trying to construct a dense identity matrix with dimensions equal to the number of observations - in this case n * n * 8 bytes
is nearly 3 Gb ...)
Computing cAIC is really only necessary if you want to select models on the basis of individual-level predictions; if you want to select on the basis of population-level predictions, AIC should be acceptable (and is computationally much cheaper). The simplest selection procedure is based on p-values (I don't like it because I don't think modeling decisions should be based on significance testing, but lots of people use it).
The step()
function in lmerTest
will do p-value based backward selection:
system.time(ss <- step(model_full,reduce.fixed=FALSE))
takes about 4.5 minutes on my old laptop. The result (abbreviated) is that it tests the effect of dropping Truth_value
, Polarity:Color
, and Polarity:Delay
from the random effects, and concludes that it shouldn't drop any of them.
Backward reduced random-effect table:
Eliminated npar logLik AIC LRT Df Pr(>Chisq)
<none> 43 -8235.3 16557
T_i(1+P*C+D+D:P+T_|S 0 36 -8366.3 16804 261.915 7 < 2.2e-16 ***
P:Ci(1+P*C+D+D:P+T|S 0 36 -8257.1 16586 43.693 7 2.451e-07 ***
P:Di(1+P*C+D+D:P+T|S 0 36 -8245.0 16562 19.507 7 0.006739 **
---
?step.lmerModLmerTest
... a column '"Eliminated"' indicating the order in which terms are eliminated from the model with zero ('0') indicating that the term is not eliminated from the model.
In this case the step()
function has tried to drop all of the highest-order terms (two-way interactions + main effect of Truth_value
, which isn't involved in an interaction), and found that it doesn't want to drop any of them. In this case the p-value criteria (all terms have p<0.05) and the AIC criteria (all reduced models have AIC larger than the original model) agree with each other.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.