[英]How to make a multi-factorial regression (general linear model) in R without knowing in advance the number of predictors?
I want to make a multi-factorial linear regression in R without explicitly knowing the number of predictors before . 我想在R中进行多元线性回归, 而不必先明确知道之前的预测变量数量 。
I have about 400 arrays and i'm performing through a loop, multiple factor regression for each of them. 我大约有400个数组,我通过一个循环执行每个数组的多因素回归。 For each of the regressions I have at most 7 predictors.
对于每个回归,我最多具有7个预测变量。 My problem is in the 'at most' , some predictors don't exist for some of the arrays.
我的问题是“最多” ,对于某些数组不存在某些预测变量。 In this situation, when I do something like this, it obviously won't work
LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = foo)
. 在这种情况下,当我做这样的事情时,显然不会工作
LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = foo)
。 Where foo is a data frame with 8 columns [Y, V1, V2, ... V7] 其中foo是具有8列[Y,V1,V2,... V7]的数据帧
I've actually found a solution which consists in replacing by a zero vector any missing predictor. 我实际上找到了一个解决方案,其中包括将任何缺少的预测变量替换为零向量。 It works, but I'm constrained to keep and process many unuseful data that take memory (each of the array has about 40,000 values).
它可以工作,但是我不得不保留和处理许多占用内存的无用数据(每个数组都有大约40,000个值)。
Here is what the code looks like 这是代码的样子
for (current_array in arrays)
{
Y = get.data(current_array) #Actually lot of long process
regressors_mat = matrix (0, nrow = 40000, ncol = 7) # All non existing indicators will stay at 0
colmatreg = 0
for (predictor in predictors)
{
colmatreg = colmatreg + 1
if (!(predictor.exists.for(current_array))
{
next
}
regressors_mat[, colmatreg] = get.data(predictor) #Actually lot of long process
}
dtf = data.frame(cbind(regressors_mat, Y))
colnames(dtf)[ncol(dtf)] = "Y"
LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)#won't work if the 7 predictors are not available
# Long process
}
# Long process
Is there anyway to perform multi factor linear regression without having to write LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)
which doesn't work when all 7 predictors are not available and without having to save and process 40,000 x 400 x nb_of_unavailable_predictors ? 无论如何,是否有必要执行多因素线性回归而不必编写
LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)
,这在所有7个预测变量均不可用时不起作用,并且无需保存和处理40,000 x 400 x nb_of_unavailable_predictors ?
Something like this would be great: 像这样的东西会很棒:
for (current_array in arrays)
{
Y = get.data(current_array)
nbcol = nb.predictos.available(current_array) # I can have this function
regressors_mat = matrix (0, nrow = 40000, ncol = nbcol )
colmatreg = 0
for (predictor in predictors)
{
colmatreg = colmatreg + 1
if (!(predictor.exists.for(current_array))
{
next
}
regressors_mat[, colmatreg] = get.data(predictor) #Actually lot of long process
}
dtf = data.frame(cbind(regressors_mat, Y))
colnames(dtf)[ncol(dtf)] = "Y"
LinearModel = lm(Y ~ colSums(dtf[, 1:(ncol(dtf) -1)], data = dtf)
#Allowing to make the multifactorial model without knowing in advance the number of factors
}
Or if it is more efficient I won't even have to preallocate, i could concatenate the columns 或者,如果它更有效,我什至不必预先分配,我可以将列连接起来
Any help or advice would be great. 任何帮助或建议都很好。 Thanks!
谢谢!
I've finally found that .
我终于找到了
.
was the solution! 是解决方案! If dtf is a dataframe with 8 columns [Y, V1, V2, ... V7]
如果dtf是具有8列[Y,V1,V2,... V7]的数据帧
LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)
is exactly the same as: 与以下内容完全相同:
LinearModel = lm(Y ~ ., data = dtf)
.
, will take all the remaining columns. ,将占用所有剩余的列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.