[英]Sparse Mixed Model with lme4 or other package
I use mixed models on a large file (500000 rows).我在一个大文件(500000 行)上使用混合模型。 My model formula looks like this:
我的模型公式如下所示:
Y ~ 0 + num1:factor1 + num1:factor2 + num2:factor3 + factor4 + (0 + num3|subject) + (0 + num4|subject) + (1|subject)
, Y ~ 0 + num1:factor1 + num1:factor2 + num2:factor3 + factor4 + (0 + num3|subject) + (0 + num4|subject) + (1|subject)
,
where num
- numeric variables;其中
num
- 数字变量; factor
- categorical variables/factors. factor
- 分类变量/因子。
Since categorical variables have many unique levels , the fixed effects matrix is very sparse (sparsity ~0.9).由于分类变量有许多独特的水平,固定效应矩阵是非常稀疏(稀疏〜0.9)。
Fitting such a matrix if it is handle as dense requires a lot of time and RAM.如果处理如此密集,则拟合这样的矩阵需要大量时间和 RAM。
I had the same problem with linear regression.我对线性回归有同样的问题。
My dense matrix was 20GB
, but when I converted it to sparse it became only 35 MB
.我的密集矩阵是
20GB
,但是当我将它转换为稀疏矩阵时,它变成了只有35 MB
。
So, I refused to use lm
function and instead it used two another functions:所以,我拒绝使用
lm
函数,而是使用了另外两个函数:
sparse.model.matrix
(to create a sparse model/design matrix) and sparse.model.matrix
(创建稀疏模型/设计矩阵)和MatrixModels:::lm.fit.sparse
(to fit a sparse matrix and calculate coefficients). MatrixModels:::lm.fit.sparse
(拟合稀疏矩阵并计算系数)。 Can I apply a similar approach to mixed models?我可以将类似的方法应用于混合模型吗?
What functions / packages can I use to implement this?我可以使用哪些函数/包来实现这一点?
That is, my main question is whether it is possible to implement mixed models with sparse matrices ?也就是说,我的主要问题是是否可以使用稀疏矩阵实现混合模型?
What functions should I use to create X
and Z
sparse model matrices?我应该使用哪些函数来创建
X
和Z
稀疏模型矩阵?
Then, which function should I use for fitting the model with sparse matrices to get coefficients?那么,我应该使用哪个函数来拟合具有稀疏矩阵的模型以获得系数?
I would be very-very grateful for any help with this!我将非常非常感谢您对此的任何帮助!
glmmTMB
has a sparseX
argument:glmmTMB
版本开始, glmmTMB
有一个sparseX
参数:sparseX: a named logical vector containing (possibly) elements named "cond", "zi", "disp" to indicate whether fixed-effect model matrices for particular model components should be generated as sparse matrices, eg 'c(cond=TRUE)'.
sparseX:一个命名的逻辑向量,包含(可能)名为“cond”、“zi”、“disp”的元素,以指示是否应将特定模型组件的固定效应模型矩阵生成为稀疏矩阵,例如 'c(cond=TRUE) '。 Default is all 'FALSE'
默认全部为“假”
You would probably want glmmTMB([formula], [data], sparseX=c(cond=TRUE))
( glmmTMB
uses family="gaussian"
by default).您可能需要
glmmTMB([formula], [data], sparseX=c(cond=TRUE))
(默认情况下glmmTMB
使用family="gaussian"
)。
glmmTMB
is not quite as fast for linear mixed models as lme4
is: I don't know what your mileage will be (but will be interested to here). glmmTMB
是不是很快速的线性混合模型作为lme4
是:我不知道你的里程将是怎样的(但会在这里兴趣)。 There is also some discussion here about how to hack the equivalent of sparse model matrices in lme4
(by letting the many-level factor be a random effect with a large fixed variance).也有一些讨论, 在这里了解如何破解稀疏矩阵模型中的等价
lme4
(通过让许多层次的因素是与庞大的固定方差的随机效应)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.