如何在R中进行多因素回归（一般线性模型）而不预先知道预测变量的数量？

Question

我想在R中进行多元线性回归， 而不必先明确知道之前的预测变量数量 。

我大约有400个数组，我通过一个循环执行每个数组的多因素回归。 对于每个回归，我最多具有7个预测变量。 我的问题是“最多” ，对于某些数组不存在某些预测变量。 在这种情况下，当我做这样的事情时，显然不会工作LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = foo) 。 其中foo是具有8列[Y，V1，V2，... V7]的数据帧

我实际上找到了一个解决方案，其中包括将任何缺少的预测变量替换为零向量。 它可以工作，但是我不得不保留和处理许多占用内存的无用数据（每个数组都有大约40,000个值）。

这是代码的样子

for (current_array in arrays)
{
    Y = get.data(current_array) #Actually lot of long process

    regressors_mat = matrix (0, nrow = 40000, ncol = 7) # All non existing indicators will stay at 0
    colmatreg = 0
    for (predictor in predictors)
    {
        colmatreg = colmatreg + 1       
        if (!(predictor.exists.for(current_array))
        { 
                next
        }
        regressors_mat[, colmatreg] = get.data(predictor) #Actually lot of long process
    }
    dtf = data.frame(cbind(regressors_mat, Y))  
    colnames(dtf)[ncol(dtf)] = "Y"

    LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)#won't work if the 7 predictors are not available
# Long process 
}   
# Long process

无论如何，是否有必要执行多因素线性回归而不必编写LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf) ，这在所有7个预测变量均不可用时不起作用，并且无需保存和处理40,000 x 400 x nb_of_unavailable_predictors ？

像这样的东西会很棒：

for (current_array in arrays)
{
    Y = get.data(current_array)
    nbcol = nb.predictos.available(current_array) # I can have this function
    regressors_mat = matrix (0, nrow = 40000, ncol = nbcol )
    colmatreg = 0
    for (predictor in predictors)
    {
        colmatreg = colmatreg + 1       
        if (!(predictor.exists.for(current_array))
        { 
                next
        }
        regressors_mat[, colmatreg] = get.data(predictor) #Actually lot of long process
    }
    dtf = data.frame(cbind(regressors_mat, Y))  
    colnames(dtf)[ncol(dtf)] = "Y"

    LinearModel = lm(Y ~ colSums(dtf[, 1:(ncol(dtf) -1)], data = dtf)
#Allowing to make the multifactorial model without knowing in advance the number of factors
}

或者，如果它更有效，我什至不必预先分配，我可以将列连接起来

任何帮助或建议都很好。 谢谢！

Answer 1

我终于找到了. 是解决方案！ 如果dtf是具有8列[Y，V1，V2，... V7]的数据帧

LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)

与以下内容完全相同：

LinearModel = lm(Y ~ ., data = dtf)

. ，将占用所有剩余的列。

如何在R中进行多因素回归（一般线性模型）而不预先知道预测变量的数量？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-07-16 08:39:36

如何在R中进行多因素回归（一般线性模型）而不预先知道预测变量的数量？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-07-16 08:39:36

解决方案1
0 已采纳 2019-07-16 08:39:36