简体   繁体   English

带有 lme4 或其他包的稀疏混合模型

[英]Sparse Mixed Model with lme4 or other package

I use mixed models on a large file (500000 rows).我在一个大文件(500000 行)上使用混合模型。 My model formula looks like this:我的模型公式如下所示:
Y ~ 0 + num1:factor1 + num1:factor2 + num2:factor3 + factor4 + (0 + num3|subject) + (0 + num4|subject) + (1|subject) , Y ~ 0 + num1:factor1 + num1:factor2 + num2:factor3 + factor4 + (0 + num3|subject) + (0 + num4|subject) + (1|subject) ,
where num - numeric variables;其中num - 数字变量; factor - categorical variables/factors. factor - 分类变量/因子。

Since categorical variables have many unique levels , the fixed effects matrix is ​​very sparse (sparsity ~0.9).由于分类变量有许多独特的水平,固定效应矩阵是非常稀疏(稀疏〜0.9)。
Fitting such a matrix if it is handle as dense requires a lot of time and RAM.如果处理如此密集,则拟合这样的矩阵需要大量时间和 RAM。

I had the same problem with linear regression.我对线性回归有同样的问题。
My dense matrix was 20GB , but when I converted it to sparse it became only 35 MB .我的密集矩阵是20GB ,但是当我将它转换为稀疏矩阵时,它变成了只有35 MB
So, I refused to use lm function and instead it used two another functions:所以,我拒绝使用lm函数,而是使用了另外两个函数:

  1. sparse.model.matrix (to create a sparse model/design matrix) and sparse.model.matrix (创建稀疏模型/设计矩阵)和
  2. MatrixModels:::lm.fit.sparse (to fit a sparse matrix and calculate coefficients). MatrixModels:::lm.fit.sparse (拟合稀疏矩阵并计算系数)。

Can I apply a similar approach to mixed models?我可以将类似的方法应用于混合模型吗?
What functions / packages can I use to implement this?我可以使用哪些函数/包来实现这一点?

That is, my main question is whether it is possible to implement mixed models with sparse matrices ?也就是说,我的主要问题是是否可以使用稀疏矩阵实现混合模型
What functions should I use to create X and Z sparse model matrices?我应该使用哪些函数来创建XZ稀疏模型矩阵?
Then, which function should I use for fitting the model with sparse matrices to get coefficients?那么,我应该使用哪个函数来拟合具有稀疏矩阵的模型以获得系数?

I would be very-very grateful for any help with this!我将非常非常感谢您对此的任何帮助!

  • As of version 1.0.2.1 on CRAN, glmmTMB has a sparseX argument:从 CRAN 上的glmmTMB版本开始, glmmTMB有一个sparseX参数:

sparseX: a named logical vector containing (possibly) elements named "cond", "zi", "disp" to indicate whether fixed-effect model matrices for particular model components should be generated as sparse matrices, eg 'c(cond=TRUE)'. sparseX:一个命名的逻辑向量,包含(可能)名为“cond”、“zi”、“disp”的元素,以指示是否应将特定模型组件的固定效应模型矩阵生成为稀疏矩阵,例如 'c(cond=TRUE) '。 Default is all 'FALSE'默认全部为“假”

You would probably want glmmTMB([formula], [data], sparseX=c(cond=TRUE)) ( glmmTMB uses family="gaussian" by default).您可能需要glmmTMB([formula], [data], sparseX=c(cond=TRUE)) (默认情况下glmmTMB使用family="gaussian" )。

glmmTMB is not quite as fast for linear mixed models as lme4 is: I don't know what your mileage will be (but will be interested to here). glmmTMB是不是很快速的线性混合模型作为lme4是:我不知道你的里程将是怎样的(但会在这里兴趣)。 There is also some discussion here about how to hack the equivalent of sparse model matrices in lme4 (by letting the many-level factor be a random effect with a large fixed variance).也有一些讨论, 在这里了解如何破解稀疏矩阵模型中的等价lme4 (通过让许多层次的因素是与庞大的固定方差的随机效应)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM