简体繁体 English

带有 lme4 或其他包的稀疏混合模型

[英]Sparse Mixed Model with lme4 or other package

原文 2020-11-03 15:56:59 0 1 r/ matrix/ sparse-matrix/ lme4/ mixed-models

I use mixed models on a large file (500000 rows).我在一个大文件（500000 行）上使用混合模型。 My model formula looks like this:我的模型公式如下所示：
Y ~ 0 + num1:factor1 + num1:factor2 + num2:factor3 + factor4 + (0 + num3|subject) + (0 + num4|subject) + (1|subject) , Y ~ 0 + num1:factor1 + num1:factor2 + num2:factor3 + factor4 + (0 + num3|subject) + (0 + num4|subject) + (1|subject) ,
where num - numeric variables;其中num - 数字变量； factor - categorical variables/factors. factor - 分类变量/因子。

Since categorical variables have many unique levels , the fixed effects matrix is very sparse (sparsity ~0.9).由于分类变量有许多独特的水平，固定效应矩阵是非常稀疏（稀疏〜0.9）。
Fitting such a matrix if it is handle as dense requires a lot of time and RAM.如果处理如此密集，则拟合这样的矩阵需要大量时间和 RAM。

I had the same problem with linear regression.我对线性回归有同样的问题。
My dense matrix was 20GB , but when I converted it to sparse it became only 35 MB .我的密集矩阵是20GB ，但是当我将它转换为稀疏矩阵时，它变成了只有35 MB 。
So, I refused to use lm function and instead it used two another functions:所以，我拒绝使用lm函数，而是使用了另外两个函数：

sparse.model.matrix (to create a sparse model/design matrix) and sparse.model.matrix （创建稀疏模型/设计矩阵）和
MatrixModels:::lm.fit.sparse (to fit a sparse matrix and calculate coefficients). MatrixModels:::lm.fit.sparse （拟合稀疏矩阵并计算系数）。

Can I apply a similar approach to mixed models?我可以将类似的方法应用于混合模型吗？
What functions / packages can I use to implement this?我可以使用哪些函数/包来实现这一点？

That is, my main question is whether it is possible to implement mixed models with sparse matrices ?也就是说，我的主要问题是是否可以使用稀疏矩阵实现混合模型？
What functions should I use to create X and Z sparse model matrices?我应该使用哪些函数来创建X和Z稀疏模型矩阵？
Then, which function should I use for fitting the model with sparse matrices to get coefficients?那么，我应该使用哪个函数来拟合具有稀疏矩阵的模型以获得系数？

I would be very-very grateful for any help with this!我将非常非常感谢您对此的任何帮助！

1 个解决方案

As of version 1.0.2.1 on CRAN, glmmTMB has a sparseX argument:从 CRAN 上的glmmTMB版本开始， glmmTMB有一个sparseX参数：

sparseX: a named logical vector containing (possibly) elements named "cond", "zi", "disp" to indicate whether fixed-effect model matrices for particular model components should be generated as sparse matrices, eg 'c(cond=TRUE)'. sparseX：一个命名的逻辑向量，包含（可能）名为“cond”、“zi”、“disp”的元素，以指示是否应将特定模型组件的固定效应模型矩阵生成为稀疏矩阵，例如 'c(cond=TRUE) '。 Default is all 'FALSE'默认全部为“假”

You would probably want glmmTMB([formula], [data], sparseX=c(cond=TRUE)) ( glmmTMB uses family="gaussian" by default).您可能需要glmmTMB([formula], [data], sparseX=c(cond=TRUE)) （默认情况下glmmTMB使用family="gaussian" ）。

glmmTMB is not quite as fast for linear mixed models as lme4 is: I don't know what your mileage will be (but will be interested to here). glmmTMB是不是很快速的线性混合模型作为lme4是：我不知道你的里程将是怎样的（但会在这里兴趣）。 There is also some discussion here about how to hack the equivalent of sparse model matrices in lme4 (by letting the many-level factor be a random effect with a large fixed variance).也有一些讨论，在这里了解如何破解稀疏矩阵模型中的等价lme4 （通过让许多层次的因素是与庞大的固定方差的随机效应）。