[英]Mismatching results for singular fit with different R/lme4 versions
I am trying to match the estimate of random effects from R version 3.5.3 (lme4 1.1-18-1) to R version 4.1.1 (lme4 1.1-27.1).我正在尝试匹配从 R 版本 3.5.3 (lme4 1.1-18-1) 到 R 版本 4.1.1 (lme4 1.1-27.1) 的随机效应估计。 However, there is a small difference of random effects between these two versions when there is singular fit.但是,当存在奇异拟合时,这两个版本之间的随机效应存在细微差异。 I'm fine with singularity warnings, but it is puzzling that different versions of R/lme4 produce slightly different results.我对奇点警告没意见,但令人费解的是,不同版本的 R/lme4 产生的结果略有不同。
The following scripts are from R version 3.5.3 (lme4 1.1-18-1) and R version 4.1.1 (lme4 1.1-27.1) with the dataset Arabidopsis from lme4.以下脚本来自 R 版本 3.5.3 (lme4 1.1-18-1) 和 R 版本 4.1.1 (lme4 1.1-27.1) 以及来自 lme4 的拟南芥数据集。
> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] minqa_1.2.4 MASS_7.3-51.1 compiler_3.5.3 Matrix_1.2-15
[5] tools_3.5.3 Rcpp_1.0.1 splines_3.5.3 nlme_3.1-137
[9] grid_3.5.3 nloptr_1.2.1 lme4_1.1-18-1 lattice_0.20-38
> library(lme4)
Loading required package: Matrix
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
Groups Name Std.Dev.
reg:popu (Intercept) 7.744768797534
reg (Intercept) 10.629179104291
Residual 39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> fit2@theta
[1] 0.150979711638631 0.000000000000000 0.189968995915902
[4] 0.260818869156072
> VarCorr(fit2)
Groups Name Std.Dev.
reg:popu:amd:status (Intercept) 5.841181759473
reg:popu:amd (Intercept) 0.000000000000
reg:popu (Intercept) 7.349619506926
reg (Intercept) 10.090696322743
Residual 38.688521100461
> ##########
> #Example3#
> ##########
> devfun353 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> save.image('myEnvironment353.Rdata')
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] minqa_1.2.4 MASS_7.3-54 compiler_4.1.1 minque_2.0.0 Matrix_1.3-4
[6] tools_4.1.1 Rcpp_1.0.7 tinytex_0.34 splines_4.1.1 nlme_3.1-152
[11] grid_4.1.1 xfun_0.27 nloptr_1.2.2.2 boot_1.3-28 lme4_1.1-27.1
[16] ADDutil_2.2.1.9005 lattice_0.20-44
> library(lme4)
Loading required package: Matrix
Warning message:
package ‘lme4’ was built under R version 4.1.2
> options(digits = 15)
> ##########
> #Example1#
> ##########
> fit1 <- lmer(total.fruits~(1|reg)+(1|reg:popu),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
> VarCorr(fit1)
Groups Name Std.Dev.
reg:popu (Intercept) 7.744768797534
reg (Intercept) 10.629179104291
Residual 39.028818969641
> ##########
> #Example2#
> ##########
> fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
boundary (singular) fit: see ?isSingular
> fit2@theta
[1] 0.150979743348540 0.000000000000000 0.189969036985684 0.260818797487214
> VarCorr(fit2)
Groups Name Std.Dev.
reg:popu:amd:status (Intercept) 5.841182965248
reg:popu:amd (Intercept) 0.000000000000
reg:popu (Intercept) 7.349621069388
reg (Intercept) 10.090693513643
Residual 38.688520961140
> ##########
> #Example3#
> ##########
> devfun411 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"),devFunOnly = T)
> load('myEnvironment353.Rdata')
> devfun353 <- lme4:::mkdevfun(environment(devfun353))
> minqa::bobyqa(c(1,1,1,1),devfun353,0,control = list(iprint=2))
npt = 6 , n = 4
rhobeg = 0.2 , rhoend = 2e-07
start par. = 1 1 1 1 fn = 6443.44054431489
rho: 0.020 eval: 11 fn: 6393.61 par: 0.00000 0.621363 0.744867 0.823498
rho: 0.0020 eval: 38 fn: 6361.97 par:0.156855 0.00000 0.190090 0.234676
rho: 0.00020 eval: 49 fn: 6361.94 par:0.150719 0.00000 0.190593 0.249106
rho: 2.0e-05 eval: 67 fn: 6361.94 par:0.150988 0.00000 0.189943 0.260821
rho: 2.0e-06 eval: 74 fn: 6361.94 par:0.150980 0.00000 0.189965 0.260811
rho: 2.0e-07 eval: 82 fn: 6361.94 par:0.150980 0.00000 0.189969 0.260819
At return
eval: 90 fn: 6361.9381 par: 0.150980 0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898
objective: 6361.93810274656
number of function evaluations: 90
> minqa::bobyqa(c(1,1,1,1),devfun411,0,control = list(iprint=2))
npt = 6 , n = 4
rhobeg = 0.2 , rhoend = 2e-07
start par. = 1 1 1 1 fn = 6443.44054431489
rho: 0.020 eval: 11 fn: 6393.61 par: 0.00000 0.621363 0.744867 0.823498
rho: 0.0020 eval: 38 fn: 6361.97 par:0.156855 0.00000 0.190090 0.234676
rho: 0.00020 eval: 49 fn: 6361.94 par:0.150719 0.00000 0.190593 0.249106
rho: 2.0e-05 eval: 67 fn: 6361.94 par:0.150988 0.00000 0.189943 0.260821
rho: 2.0e-06 eval: 74 fn: 6361.94 par:0.150980 0.00000 0.189965 0.260811
rho: 2.0e-07 eval: 82 fn: 6361.94 par:0.150980 0.00000 0.189969 0.260819
At return
eval: 90 fn: 6361.9381 par: 0.150980 0.00000 0.189969 0.260819
parameter estimates: 0.150979722854965, 0, 0.189968942342717, 0.260818725554898
objective: 6361.93810274656
number of function evaluations: 90
When the model is simpler, there is no singularity warning and the results match.当 model 更简单时,没有奇点警告并且结果匹配。 (See example 1 in both scripts) When model is relatively complex, there is singularity warning and the results are slightly off (See example 2 in both scripts). (参见两个脚本中的示例1)当model比较复杂时,出现奇点警告并且结果略有偏差(参见两个脚本中的示例2)。 The difference is <1e-5 in this case but I have observed <1e-4 before.在这种情况下,差异是 <1e-5,但我之前已经观察到 <1e-4。 Can anyone shed some lights on why the results are slightly different?任何人都可以阐明为什么结果略有不同吗? and is it even possible to match the results to at least 1e-8?甚至有可能将结果匹配到至少 1e-8 吗?
Not sure if this is useful but I also extract devfun from 3.5.3 and run it in 4.1.1.不确定这是否有用,但我还从 3.5.3 中提取 devfun 并在 4.1.1 中运行它。 The results match.结果一致。 (see example 3) In addition, when I read iteration history from BOBYQA, the $\theta$ of the term that leads to singularity warning oscillates between 0 and small numbers (around 1e-7 to 1e-9). (参见示例 3)此外,当我从 BOBYQA 读取迭代历史记录时,导致奇点警告的项的 $\theta$ 在 0 和小数字(大约 1e-7 到 1e-9)之间振荡。
This post discusses similar topics. 这篇文章讨论了类似的话题。 It also shows the singularity warning leads to slightly different estimate.它还显示奇点警告导致估计略有不同。 There is no obvious change in LME4 NEWS that cause the difference. LME4 NEWS没有明显变化导致差异。 This FAQ and ?isSingular give great explanation on singularity warning but does not address the issue of mismatching directly. 这个 FAQ和?isSingular对奇点警告给出了很好的解释,但没有直接解决不匹配的问题。
TL;DR: Sometimes when there is singularity warning (I am ok with), the random effects are slightly different under different R/lme4 versions. TL;DR:有时当出现奇点警告时(我没意见),随机效应在不同的 R/lme4 版本下略有不同。 Why is this happening and how to address it?为什么会发生这种情况以及如何解决?
This is a hard problem to solve in general, and even a fairly hard problem to solve in specific cases.这是一个普遍难以解决的问题,在具体情况下更是一个比较难解决的问题。
I think the difference arose between version 1.1.27.1 and 1.1.28, probably from this NEWS item:我认为版本 1.1.27.1 和 1.1.28 之间出现了差异,可能来自这个新闻项目:
construction of interacting factors (eg when f1:f2 or f1/f2 occur in random effects terms) is now more efficient for partially crossed designs (doesn't try to create all combinations of f1 and f2) (GH #635 and #636)相互作用因素的构造(例如,当 f1:f2 或 f1/f2 出现在随机效应项中时)现在对于部分交叉设计更有效(不尝试创建 f1 和 f2 的所有组合)(GH #635 和 #636)
My guess is that this changes the ordering of the components in the Z matrix, which in turn means that results of various linear algebra operations are not identical (eg floating point arithmetic is not associative , so while binary addition is commutative ( a + b == b + a
), left-to-right evaluation of a sum may not be the same as right-to-left evaluation ( (a+b) + c != a + (b+c)
)...)我的猜测是这改变了 Z 矩阵中分量的顺序,这反过来意味着各种线性代数运算的结果不相同(例如浮点运算不是结合的,所以虽然二进制加法是可交换的( a + b == b + a
),从左到右求和可能与从右到左求值不同 ( (a+b) + c != a + (b+c)
)...)
My attempt at reproducing the problem uses the same version of R ("under development 2022-02-25 r81818") and compares only lme4
package versions 1.18.1 with 1.1.28.9000 (development);我重现问题的尝试使用相同版本的 R(“开发中 2022-02-25 r81818”),并且仅比较lme4
package 版本 1.18.1 和 1.1.28.9000(开发); any upstream packages such as Rcpp
, RcppEigen
, Matrix
use the same versions.任何上游包,如Rcpp
、 RcppEigen
、 Matrix
使用相同的版本。 (I had to backport a few changes from the development version of lme4
to 1.1.18.1 to get it to install under the most recent version of R, but I don't think any of those modifications would affect numerical results.) (我不得不将 lme4 的开发版本的一些更改lme4
到 1.1.18.1 以使其安装在最新版本的 R 下,但我认为这些修改中的任何一个都不会影响数值结果。)
I did the comparison by installing different versions of the lme4
package before running the code in a fresh R session. My results differed between versions 1.1.18.1 and 1.1.28 less than yours did (both fits were singular, and the relative differences in the theta
estimates were of the order of 2e-7 — still greater than your desired 1e-8 tolerance but much smaller than 1e-4...)我通过安装不同版本的lme4
package 来进行比较,然后在新的 R session 中运行代码。我的结果在版本 1.1.18.1 和 1.1.28 之间的差异小于你的结果(两者都是单一的,并且相对差异在theta
估计是 2e-7 的数量级——仍然大于你想要的 1e-8 公差但比 1e-4 小得多......)
The results from 1.1.18.1 and 1.1.27.1 were identical. 1.1.18.1 和 1.1.27.1 的结果是相同的。
Matrix
/ Rcpp
/ RcppEigen
/ minqa
), possibly the compiler versions and settings used to build everything [all of which could make a difference]你的两个测试平台之间的差异比我的更多:R 版本,上游包( Matrix
/ Rcpp
/ RcppEigen
/ minqa
),可能是用于构建所有内容的编译器版本和设置[所有这些都可能有所不同]lme4
, and other packages).结果的差异远小于统计不确定性的大小,而且如此大的差异也可能发生在不同的平台(操作系统/编译器版本等)上,即使是在其他方面相同的环境(版本 R、 lme4
和其他包)。## R CMD INSTALL ~/R/misc/lme4
library(lme4)
packageVersion("lme4")
## 1.1.18.1
fit2 <- lmer(total.fruits~(1|reg)+(1|reg:popu)+(1|reg:popu:amd)+(1|reg:popu:amd:status),data=Arabidopsis,control=lmerControl(optimizer="bobyqa"))
dput(getME(fit2, "theta"))
t1 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072
)
Run under 1.1.28.9000 (fresh R session, re-run package-loading/ lmer
code above)在1.1.28.9000下运行(新鲜R session,重新运行上面的package-loading/ lmer
代码)
## R CMD INSTALL ~/R/pkgs/lme4git/lme4
packageVersion("lme4")
## [1] ‘1.1.28.9000’
dput(getME(fit2, "theta"))
t2 <- c(`reg:popu:amd:status.(Intercept)` = 0.15097974334854, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189969036985684, `reg.(Intercept)` = 0.260818797487214
)
(t1-t2)/((t1+t2)/2)
## reg:popu:amd:status.(Intercept) reg:popu:amd.(Intercept)
## -2.100276e-07 NaN
## reg:popu.(Intercept) reg.(Intercept)
## -2.161920e-07 2.747841e-07
The second element is NaN
because both versions give singular fits (0/0 == NaN).第二个元素是NaN
,因为两个版本都给出奇异拟合 (0/0 == NaN)。
Run under 1.1.27.1 (fresh R session, re-run package-loading/ lmer
code above)在1.1.27.1下运行(新鲜R session,重新运行上面的package-loading/ lmer
代码)
## remotes::install_version("lme4", "1.1-27.1")
t3 <- c(`reg:popu:amd:status.(Intercept)` = 0.150979711638631, `reg:popu:amd.(Intercept)` = 0,
`reg:popu.(Intercept)` = 0.189968995915902, `reg.(Intercept)` = 0.260818869156072)
identical(t1, t3) ## TRUE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.