简体   繁体   English

使用 mgcv gam 运行随机错误 model 需要太多 memory

[英]Running random error model with mgcv gam takes too much memory

I am working on a model that includes several REs and a spline for one of the variables, so I am trying to use gam() .我正在研究 model,其中包括多个 RE 和一个变量的样条曲线,因此我正在尝试使用gam() However, I reach memory exhaust limit error (even when I run it on a cluster with 128GB).但是,我遇到了 memory exhaust limit 错误(即使我在 128GB 的集群上运行它)。 This happens even when I run the simplest of models with just one RE.即使我只用一个 RE 运行最简单的模型,也会发生这种情况。 The same models (minus the spline) run smoothly and in just a few seconds (or minutes for the full model) when I use lmer() instead.当我改用lmer()时,相同的模型(减去样条曲线)运行平稳,只需几秒钟(或完整模型的几分钟)。

I was wondering if anyone had any idea why the discrepancy between gam() and lmer() and any potential solutions.我想知道是否有人知道为什么gam()lmer()之间存在差异以及任何可能的解决方案。

Here's some code with simulated data and the simplest of models:下面是一些带有模拟数据和最简单模型的代码:

library(mgcv)
library(lme4)

set.seed(1234) 
person_n <- 38000 # number of people (grouping variable)
n_j <- 15 # number of data points per person 
B1 <- 3 # beta for the main predictor
n <- person_n * n_j 

person_id <- gl(person_n, k = n_j) #creating the grouping variable
person_RE <- rep(rnorm(person_n), each = n_j) # creating the random errors

x <- rnorm(n) # creating x as a normal dist centered at 0 and sd = 1
error <- rnorm(n) 

#putting it all together
y <- B1 * x + person_RE + error
dat <- data.frame(y, person_id, x)

m1 <- lmer(y ~ x + (1 | person_id), data = dat)

g1 <- gam(y ~ x + s(person_id, bs = "re"), method = "REML", data = dat)

m1 runs in just a couple seconds on my computer, whereas g1 hits the error: m1在我的电脑上只运行了几秒钟,而g1遇到了错误:

Error: vector memory exhausted (limit reached?)错误:矢量 memory 耗尽(达到限制?)

From ?mgcv::random.effects :来自?mgcv::random.effects

gam can be slow for fitting models with large numbers of random effects, because it does not exploit the sparsity that is often a feature of parametric random effects ... However 'gam' is often faster and more reliable than 'gamm' or 'gamm4', when the number of random effects is modest .对于具有大量随机效应的拟合模型, gam可能会很慢,因为它没有利用参数随机效应通常具有的稀疏性......但是“gam”通常比“gamm”或“gamm4”更快更可靠',当随机效应的数量适中时 [emphasis added] [强调]

What this means is that in the course of setting up the model, s(., bs = "re") tries to generate a dense model matrix equivalent to model.matrix( ~ person_id - 1) ;这意味着在设置 model 的过程中, s(., bs = "re")尝试生成一个密集的 model 矩阵,相当于model.matrix( ~ person_id - 1) this takes (nrows x nlevels x 8 bytes/double) = (3.8e4*5.7e5*8)/2^30 = 161.4 Gb (which is exactly the object size that my machine reports it can't allocate).这需要 (nrows x nlevels x 8 bytes/double) = (3.8e4*5.7e5*8)/2^30 = 161.4 Gb(这正是我的机器报告它无法分配的 object 大小)。

Check out mgcv::gamm and gamm4::gamm4 for more memory-efficient (and faster, in this case) methods...查看mgcv::gammgamm4::gamm4以获得更节省内存(在本例中更快)的方法...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用比例数据在 mgcv 中运行二项式 GAM 时出错 - Error running binomial GAM in mgcv with proportional data 使用 jsonlite 序列化 mgcv gam model 时出错 - Error when serializing an mgcv gam model with jsonlite 拟合随机效果Z20F35E630DAF44DBFA4C3F68F53999999999999999999999999999999999999999DAFAM()而不是GAM()ZC1C425268E68E6855174C174F174140278E608ENENENENENENENENENENENENENENED时,错误 - error when fitting random effects model using bam() rather than gam() function in mgcv package, R mgcv gam()错误:模型的系数比数据多 - mgcv gam() error: model has more coefficients than data ordisurf 与 mgcv:gam 模型 - ordisurf vs mgcv:gam model 是否可以在 mgcv 中为 GAM model 添加进度条? - Is it possible to add a progess bar to GAM model in mgcv? 使用权重的 GAM (mgcv) 二项式 model - Binomial model with GAM (mgcv) using weights 如何修复mgcv中gam()中的错误'terms.formula(公式,数据=数据)中的错误:ExtractVars中的无效model公式' - How to fix error in gam() in mgcv 'Error in terms.formula(formula, data = data) : invalid model formula in ExtractVars' mgcv:错误模型的系数比数据多,与gam()中的参数有关 - mgcv: Error Model has more coefficients than data, related to the argument by in the gam() 您如何比较gam模型和gamm模型? (mgcv) - How do you compare a gam model with a gamm model? (mgcv)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM