stepcAIC - Error in eval(predvars, data, env) : object 'Color1' not found

Question

I want to select the optimal random structure for my mixed effects model (fitted with lmer() from lme4 ). I found the function stepcAIC() from the package cAIC4 , which is supposed to compare models and select the one with the smallest AIC in a stepwise fashion. Although the implementation looks very simple, I get an error.

After fitting my model, I ran the following function:

stepcAIC(model_full, direction="backward")

So first - it takes forever to run. Second - I get an error message. I tried explicitly specifying the dataset:

stepcAIC(model_full, direction="backward", data=data_correct)

I also tried to update R to the newest version and then ran it again, but it doesn't help.

Does anyone have a positive experience with this function to tell me what I did wrong?

The error I get is this:

Error in eval(predvars, data, env) : object 'Color1' not found

I have a variable named "Color", but not "Color1". Perhaps "Color1" is a name taken from the table of the effects, but then why would it use the name from the summary table and search for it in the data frame?

I also get warnings:

In if (!hasInt(resForThisGroup)) res[[i]] <- res[[i]][-j] : the condition has length > 1 and only the first element will be used

Here is a [link]( https://drive.google.com/open?id=1jIJn2rzK3SwpKMfKGDhseYcOxinuwpue ) to download data_correct and model_full :

This is how I created model_full :

model_full <- lmer(data=data_correct, log_RT~Polarity+Delay+Truth_value+Type+Color+Order + Polarity:Delay + Polarity:Truth_value + Polarity:Order + Polarity:Type+ Polarity:Color + Delay:Truth_value+ Truth_value:Delay:Polarity + (1+Polarity*Color+Delay+Delay:Polarity+Truth_value|Subject), control=lmerControl(optimizer="bobyqa"), REML=FALSE)

This is the output of model_full :

Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: log_RT ~ Polarity + Delay + Truth_value + Type + Color + Order +  
    Polarity:Delay + Polarity:Truth_value + Polarity:Order +  
    Polarity:Type + Polarity:Color + Delay:Truth_value + Truth_value:Delay:Polarity +  
    (1 + Polarity * Color + Delay + Delay:Polarity + Truth_value |          Subject)
   Data: data_correct
Control: lmerControl(optimizer = "bobyqa")

     AIC      BIC   logLik deviance df.resid 
 16556.6  16896.2  -8235.3  16470.6    19838 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9078 -0.6585 -0.1065  0.5654  6.5045 

Random effects:
 Groups   Name             Variance  Std.Dev. Corr                               
 Subject  (Intercept)      0.0652479 0.25544                                     
          Polarity1        0.0045472 0.06743   0.51                              
          Color1           0.0030415 0.05515   0.15  0.13                        
          Delay1           0.0005240 0.02289   0.22 -0.05 -0.02                  
          Truth_value1     0.0022027 0.04693   0.00  0.48  0.23  0.00            
          Polarity1:Color1 0.0003927 0.01982   0.04 -0.33  0.57 -0.50 -0.12      
          Polarity1:Delay1 0.0001981 0.01408   0.61  0.07  0.06  0.55  0.06 -0.04
 Residual                  0.1304137 0.36113                                     
Number of obs: 19881, groups:  Subject, 38

Fixed effects:
                                Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)                    6.572e+00  4.152e-02  3.800e+01 158.301  < 2e-16 ***
Polarity1                      1.234e-01  1.124e-02  3.797e+01  10.985 2.38e-13 ***
Delay1                        -6.476e-02  4.512e-03  3.817e+01 -14.352  < 2e-16 ***
Truth_value1                   5.266e-02  8.034e-03  3.805e+01   6.556 9.83e-08 ***
Type1                          7.531e-03  2.562e-03  1.962e+04   2.939 0.003292 ** 
Color1                         2.512e-02  9.308e-03  3.756e+01   2.698 0.010379 *  
Order1                        -3.524e-02  8.981e-03  3.794e+01  -3.924 0.000354 ***
Polarity1:Delay1              -2.244e-02  3.433e-03  3.834e+01  -6.538 1.00e-07 ***
Polarity1:Truth_value1        -5.728e-02  2.563e-03  1.963e+04 -22.347  < 2e-16 ***
Polarity1:Order1              -1.250e-02  3.547e-03  3.823e+01  -3.525 0.001119 ** 
Polarity1:Type1               -7.107e-03  2.562e-03  1.962e+04  -2.774 0.005544 ** 
Polarity1:Color1               4.012e-03  4.114e-03  3.790e+01   0.975 0.335639    
Delay1:Truth_value1            5.301e-03  2.563e-03  1.963e+04   2.068 0.038629 *  
Polarity1:Delay1:Truth_value1  9.625e-03  2.563e-03  1.963e+04   3.755 0.000174 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Answer 1

(Only sort-of an answer; will delete later if appropriate.)

I can't replicate your problem because your data set is too big for the machine I'm working on at the moment; when I try to run stepcAIC(model_full, direction="backward") I get:

The cAIC of the initial model can not be calculated.

which is explained by the message from cAIC(model_full) :

Error: cannot allocate vector of size 2.9 Gb

This is perhaps not surprising, as the model is moderately large (~20K observations, 28 parameters). (Digging into the code, we can see that the model is trying to construct a dense identity matrix with dimensions equal to the number of observations - in this case n * n * 8 bytes is nearly 3 Gb ...)

Computing cAIC is really only necessary if you want to select models on the basis of individual-level predictions; if you want to select on the basis of population-level predictions, AIC should be acceptable (and is computationally much cheaper). The simplest selection procedure is based on p-values (I don't like it because I don't think modeling decisions should be based on significance testing, but lots of people use it).

The step() function in lmerTest will do p-value based backward selection:

system.time(ss <- step(model_full,reduce.fixed=FALSE))

takes about 4.5 minutes on my old laptop. The result (abbreviated) is that it tests the effect of dropping Truth_value , Polarity:Color , and Polarity:Delay from the random effects, and concludes that it shouldn't drop any of them.

Backward reduced random-effect table:

                     Eliminated npar  logLik   AIC     LRT Df Pr(>Chisq)    
<none>                            43 -8235.3 16557                          
T_i(1+P*C+D+D:P+T_|S          0   36 -8366.3 16804 261.915  7  < 2.2e-16 ***
P:Ci(1+P*C+D+D:P+T|S          0   36 -8257.1 16586  43.693  7  2.451e-07 ***
P:Di(1+P*C+D+D:P+T|S          0   36 -8245.0 16562  19.507  7   0.006739 ** 
---

?step.lmerModLmerTest

... a column '"Eliminated"' indicating the order in which terms are eliminated from the model with zero ('0') indicating that the term is not eliminated from the model.

In this case the step() function has tried to drop all of the highest-order terms (two-way interactions + main effect of Truth_value , which isn't involved in an interaction), and found that it doesn't want to drop any of them. In this case the p-value criteria (all terms have p<0.05) and the AIC criteria (all reduced models have AIC larger than the original model) agree with each other.

stepcAIC - Error in eval(predvars, data, env) : object 'Color1' not found

Question

1 answers

solution1
1 ACCPTED 2019-09-04 19:18:44

stepcAIC - Error in eval(predvars, data, env) : object 'Color1' not found

Question

1 answers

solution1 1 ACCPTED 2019-09-04 19:18:44

solution1
1 ACCPTED 2019-09-04 19:18:44