不同 R/lme4 版本的奇异拟合的不匹配结果

Question

I am trying to match the estimate of random effects from R version 3.5.3 (lme4 1.1-18-1) to R version 4.1.1 (lme4 1.1-27.1).我正在尝试匹配从 R 版本 3.5.3 (lme4 1.1-18-1) 到 R 版本 4.1.1 (lme4 1.1-27.1) 的随机效应估计。 However, there is a small difference of random effects between these two versions when there is singular fit.但是，当存在奇异拟合时，这两个版本之间的随机效应存在细微差异。 I'm fine with singularity warnings, but it is puzzling that different versions of R/lme4 produce slightly different results.我对奇点警告没意见，但令人费解的是，不同版本的 R/lme4 产生的结果略有不同。

The following scripts are from R version 3.5.3 (lme4 1.1-18-1) and R version 4.1.1 (lme4 1.1-27.1) with the dataset Arabidopsis from lme4.以下脚本来自 R 版本 3.5.3 (lme4 1.1-18-1) 和 R 版本 4.1.1 (lme4 1.1-27.1) 以及来自 lme4 的拟南芥数据集。

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] minqa_1.2.4     MASS_7.3-51.1   compiler_3.5.3  Matrix_1.2-15  
 [5] tools_3.5.3     Rcpp_1.0.1      splines_3.5.3   nlme_3.1-137   
 [9] grid_3.5.3      nloptr_1.2.1    lme4_1.1-18-1   lattice_0.20-38
> library(lme4)
Loading required package: Matrix
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
 Groups   Name        Std.Dev.       
 reg:popu (Intercept)  7.744768797534
 reg      (Intercept) 10.629179104291
 Residual             39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> fit2@theta
[1] 0.150979711638631 0.000000000000000 0.189968995915902
[4] 0.260818869156072
> VarCorr(fit2)
 Groups              Name        Std.Dev.       
 reg:popu:amd:status (Intercept)  5.841181759473
 reg:popu:amd        (Intercept)  0.000000000000
 reg:popu            (Intercept)  7.349619506926
 reg                 (Intercept) 10.090696322743
 Residual                        38.688521100461
> ##########
> #Example3#
> ##########
> devfun353 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> save.image('myEnvironment353.Rdata')

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] minqa_1.2.4        MASS_7.3-54        compiler_4.1.1     minque_2.0.0       Matrix_1.3-4      
 [6] tools_4.1.1        Rcpp_1.0.7         tinytex_0.34       splines_4.1.1      nlme_3.1-152      
[11] grid_4.1.1         xfun_0.27          nloptr_1.2.2.2     boot_1.3-28        lme4_1.1-27.1     
[16] ADDutil_2.2.1.9005 lattice_0.20-44   
> library(lme4)
Loading required package: Matrix
Warning message:
package ‘lme4’ was built under R version 4.1.2 
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
 Groups   Name        Std.Dev.       
 reg:popu (Intercept)  7.744768797534
 reg      (Intercept) 10.629179104291
 Residual             39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
boundary (singular) fit: see ?isSingular
> fit2@theta
[1] 0.150979743348540 0.000000000000000 0.189969036985684 0.260818797487214
> VarCorr(fit2)
 Groups              Name        Std.Dev.       
 reg:popu:amd:status (Intercept)  5.841182965248
 reg:popu:amd        (Intercept)  0.000000000000
 reg:popu            (Intercept)  7.349621069388
 reg                 (Intercept) 10.090693513643
 Residual                        38.688520961140
> ##########
> #Example3#
> ##########
> devfun411 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> load('myEnvironment353.Rdata')
> devfun353 <- lme4:::mkdevfun(environment(devfun353))
> minqa::bobyqa(c(1,1,1,1),devfun353,0,control = list(iprint=2))
npt = 6 , n =  4 
rhobeg =  0.2 , rhoend =  2e-07 
start par. =  1 1 1 1 fn =  6443.44054431489 
rho:    0.020 eval:  11 fn:      6393.61 par: 0.00000 0.621363 0.744867 0.823498 
rho:   0.0020 eval:  38 fn:      6361.97 par:0.156855  0.00000 0.190090 0.234676 
rho:  0.00020 eval:  49 fn:      6361.94 par:0.150719  0.00000 0.190593 0.249106 
rho:  2.0e-05 eval:  67 fn:      6361.94 par:0.150988  0.00000 0.189943 0.260821 
rho:  2.0e-06 eval:  74 fn:      6361.94 par:0.150980  0.00000 0.189965 0.260811 
rho:  2.0e-07 eval:  82 fn:      6361.94 par:0.150980  0.00000 0.189969 0.260819 
At return
eval:  90 fn:      6361.9381 par: 0.150980  0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898 
objective: 6361.93810274656 
number of function evaluations: 90 
> minqa::bobyqa(c(1,1,1,1),devfun411,0,control = list(iprint=2))
npt = 6 , n =  4 
rhobeg =  0.2 , rhoend =  2e-07 
start par. =  1 1 1 1 fn =  6443.44054431489 
rho:    0.020 eval:  11 fn:      6393.61 par: 0.00000 0.621363 0.744867 0.823498 
rho:   0.0020 eval:  38 fn:      6361.97 par:0.156855  0.00000 0.190090 0.234676 
rho:  0.00020 eval:  49 fn:      6361.94 par:0.150719  0.00000 0.190593 0.249106 
rho:  2.0e-05 eval:  67 fn:      6361.94 par:0.150988  0.00000 0.189943 0.260821 
rho:  2.0e-06 eval:  74 fn:      6361.94 par:0.150980  0.00000 0.189965 0.260811 
rho:  2.0e-07 eval:  82 fn:      6361.94 par:0.150980  0.00000 0.189969 0.260819 
At return
eval:  90 fn:      6361.9381 par: 0.150980  0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898 
objective: 6361.93810274656 
number of function evaluations: 90

When the model is simpler, there is no singularity warning and the results match.当 model 更简单时，没有奇点警告并且结果匹配。 (See example 1 in both scripts) When model is relatively complex, there is singularity warning and the results are slightly off (See example 2 in both scripts). （参见两个脚本中的示例1）当model比较复杂时，出现奇点警告并且结果略有偏差（参见两个脚本中的示例2）。 The difference is <1e-5 in this case but I have observed <1e-4 before.在这种情况下，差异是 <1e-5，但我之前已经观察到 <1e-4。 Can anyone shed some lights on why the results are slightly different?任何人都可以阐明为什么结果略有不同吗？ and is it even possible to match the results to at least 1e-8?甚至有可能将结果匹配到至少 1e-8 吗？

Not sure if this is useful but I also extract devfun from 3.5.3 and run it in 4.1.1.不确定这是否有用，但我还从 3.5.3 中提取 devfun 并在 4.1.1 中运行它。 The results match.结果一致。 (see example 3) In addition, when I read iteration history from BOBYQA, the $\theta$ of the term that leads to singularity warning oscillates between 0 and small numbers (around 1e-7 to 1e-9). （参见示例 3）此外，当我从 BOBYQA 读取迭代历史记录时，导致奇点警告的项的 $\theta$ 在 0 和小数字（大约 1e-7 到 1e-9）之间振荡。

This post discusses similar topics. 这篇文章讨论了类似的话题。 It also shows the singularity warning leads to slightly different estimate.它还显示奇点警告导致估计略有不同。 There is no obvious change in LME4 NEWS that cause the difference. LME4 NEWS没有明显变化导致差异。 This FAQ and ?isSingular give great explanation on singularity warning but does not address the issue of mismatching directly. 这个 FAQ和?isSingular对奇点警告给出了很好的解释，但没有直接解决不匹配的问题。

TL;DR: Sometimes when there is singularity warning (I am ok with), the random effects are slightly different under different R/lme4 versions. TL;DR：有时当出现奇点警告时（我没意见），随机效应在不同的 R/lme4 版本下略有不同。 Why is this happening and how to address it?为什么会发生这种情况以及如何解决？

Answer 1

This is a hard problem to solve in general, and even a fairly hard problem to solve in specific cases.这是一个普遍难以解决的问题，在具体情况下更是一个比较难解决的问题。

I think the difference arose between version 1.1.27.1 and 1.1.28, probably from this NEWS item:我认为版本 1.1.27.1 和 1.1.28 之间出现了差异，可能来自这个新闻项目：

construction of interacting factors (eg when f1:f2 or f1/f2 occur in random effects terms) is now more efficient for partially crossed designs (doesn't try to create all combinations of f1 and f2) (GH #635 and #636)相互作用因素的构造（例如，当 f1:f2 或 f1/f2 出现在随机效应项中时）现在对于部分交叉设计更有效（不尝试创建 f1 和 f2 的所有组合）（GH #635 和 #636）

My guess is that this changes the ordering of the components in the Z matrix, which in turn means that results of various linear algebra operations are not identical (eg floating point arithmetic is not associative , so while binary addition is commutative ( a + b == b + a ), left-to-right evaluation of a sum may not be the same as right-to-left evaluation ( (a+b) + c != a + (b+c) )...)我的猜测是这改变了 Z 矩阵中分量的顺序，这反过来意味着各种线性代数运算的结果不相同（例如浮点运算不是结合的，所以虽然二进制加法是可交换的（ a + b == b + a )，从左到右求和可能与从右到左求值不同 ( (a+b) + c != a + (b+c) )...)

My attempt at reproducing the problem uses the same version of R ("under development 2022-02-25 r81818") and compares only lme4 package versions 1.18.1 with 1.1.28.9000 (development);我重现问题的尝试使用相同版本的 R（“开发中 2022-02-25 r81818”），并且仅比较lme4 package 版本 1.18.1 和 1.1.28.9000（开发）； any upstream packages such as Rcpp , RcppEigen , Matrix use the same versions.任何上游包，如Rcpp 、 RcppEigen 、 Matrix使用相同的版本。 (I had to backport a few changes from the development version of lme4 to 1.1.18.1 to get it to install under the most recent version of R, but I don't think any of those modifications would affect numerical results.) （我不得不将 lme4 的开发版本的一些更改lme4到 1.1.18.1 以使其安装在最新版本的 R 下，但我认为这些修改中的任何一个都不会影响数值结果。）

I did the comparison by installing different versions of the lme4 package before running the code in a fresh R session. My results differed between versions 1.1.18.1 and 1.1.28 less than yours did (both fits were singular, and the relative differences in the theta estimates were of the order of 2e-7 — still greater than your desired 1e-8 tolerance but much smaller than 1e-4...)我通过安装不同版本的lme4 package 来进行比较，然后在新的 R session 中运行代码。我的结果在版本 1.1.18.1 和 1.1.28 之间的差异小于你的结果（两者都是单一的，并且相对差异在theta估计是 2e-7 的数量级——仍然大于你想要的 1e-8 公差但比 1e-4 小得多......）

The results from 1.1.18.1 and 1.1.27.1 were identical. 1.1.18.1 和 1.1.27.1 的结果是相同的。

Q1: Why are your results more different between versions than mine? Q1：为什么你的版本之间的结果比我的差异更大？
- in general/anecdotally, numerical results on Windows are slightly more unstable/differ more from other platforms一般来说/有趣的是，Windows 上的数值结果稍微更不稳定/与其他平台的差异更大
- there are more differences between your two test platforms than among mine: R version, upstream packages ( Matrix / Rcpp / RcppEigen / minqa ), possibly the compiler versions and settings used to build everything [all of which could make a difference]你的两个测试平台之间的差异比我的更多：R 版本，上游包（ Matrix / Rcpp / RcppEigen / minqa ），可能是用于构建所有内容的编译器版本和设置[所有这些都可能有所不同]
Q2: how should one deal with this kind of problem? Q2：遇到这种问题应该怎么处理？
- as a minor frame challenge, why (other than not understanding what's going on, which is a perfectly legitimate reason to be concerned) does this worry you?作为一个次要的框架挑战，为什么（除了不了解正在发生的事情，这是一个值得关注的完全合理的理由）这会让你担心吗？ The differences in the results are way smaller than the magnitude of statistical uncertainty, and differences this large are also likely to occur across different platforms (OS/compiler version/etc.) even for otherwise identical environments (versions of R, lme4 , and other packages).结果的差异远小于统计不确定性的大小，而且如此大的差异也可能发生在不同的平台（操作系统/编译器版本等）上，即使是在其他方面相同的环境（版本 R、 lme4和其他包）。
- you could revert to version 1.1.27.1 for now...你现在可以恢复到 1.1.27.1 版本......
- I do take the differences between 1.1.27.1 as a bug, of sorts — at the very least it's an undocumented change in the package. If it were sufficiently high-priority I could investigate the code changes described above and see if there is a way to fix the problems they addressed without breaking backward compatibility (in theory this should be possible, but it could be annoyingly difficult...)我确实将 1.1.27.1 之间的差异视为某种错误 - 至少它是 package 中未记录的更改。如果它的优先级足够高，我可以调查上述代码更改，看看是否有办法在不破坏向后兼容性的情况下解决他们解决的问题（理论上这应该是可能的，但它可能非常困难......）

## R CMD INSTALL ~/R/misc/lme4
library(lme4)
packageVersion("lme4")
## 1.1.18.1
fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
dput(getME(fit2, "theta"))
t1 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072
)

Run under 1.1.28.9000 (fresh R session, re-run package-loading/ lmer code above)在1.1.28.9000下运行（新鲜R session，重新运行上面的package-loading/ lmer代码）

## R CMD INSTALL ~/R/pkgs/lme4git/lme4
packageVersion("lme4")
## [1] ‘1.1.28.9000’
dput(getME(fit2, "theta"))
t2 <- c(`reg:popu:amd:status.(Intercept)` = 0.15097974334854, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189969036985684, `reg.(Intercept)` = 0.260818797487214
)

(t1-t2)/((t1+t2)/2)
## reg:popu:amd:status.(Intercept)        reg:popu:amd.(Intercept)
##                   -2.100276e-07                             NaN
##            reg:popu.(Intercept)                 reg.(Intercept)
##                  -2.161920e-07                    2.747841e-07

The second element is NaN because both versions give singular fits (0/0 == NaN).第二个元素是NaN ，因为两个版本都给出奇异拟合 (0/0 == NaN)。

Run under 1.1.27.1 (fresh R session, re-run package-loading/ lmer code above)在1.1.27.1下运行（新鲜R session，重新运行上面的package-loading/ lmer代码）

## remotes::install_version("lme4", "1.1-27.1")

t3 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072)

identical(t1, t3) ## TRUE

不同 R/lme4 版本的奇异拟合的不匹配结果

问题描述

1 个解决方案

解决方案1
3 已采纳 2022-03-03 22:02:47

不同 R/lme4 版本的奇异拟合的不匹配结果

问题描述

1 个解决方案

解决方案1 3 已采纳 2022-03-03 22:02:47

解决方案1
3 已采纳 2022-03-03 22:02:47