简体   繁体   English

如何在R中进行多因素回归(一般线性模型)而不预先知道预测变量的数量?

[英]How to make a multi-factorial regression (general linear model) in R without knowing in advance the number of predictors?

I want to make a multi-factorial linear regression in R without explicitly knowing the number of predictors before . 我想在R中进行多元线性回归, 而不必先明确知道之前的预测变量数量

I have about 400 arrays and i'm performing through a loop, multiple factor regression for each of them. 我大约有400个数组,我通过一个循环执行每个数组的多因素回归。 For each of the regressions I have at most 7 predictors. 对于每个回归,我最多具有7个预测变量。 My problem is in the 'at most' , some predictors don't exist for some of the arrays. 我的问题是“最多” ,对于某些数组不存在某些预测变量。 In this situation, when I do something like this, it obviously won't work LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = foo) . 在这种情况下,当我做这样的事情时,显然不会工作LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = foo) Where foo is a data frame with 8 columns [Y, V1, V2, ... V7] 其中foo是具有8列[Y,V1,V2,... V7]的数据帧

I've actually found a solution which consists in replacing by a zero vector any missing predictor. 我实际上找到了一个解决方案,其中包括将任何缺少的预测变量替换为零向量。 It works, but I'm constrained to keep and process many unuseful data that take memory (each of the array has about 40,000 values). 它可以工作,但是我不得不保留和处理许多占用内存的无用数据(每个数组都有大约40,000个值)。

Here is what the code looks like 这是代码的样子

for (current_array in arrays)
{
    Y = get.data(current_array) #Actually lot of long process

    regressors_mat = matrix (0, nrow = 40000, ncol = 7) # All non existing indicators will stay at 0
    colmatreg = 0
    for (predictor in predictors)
    {
        colmatreg = colmatreg + 1       
        if (!(predictor.exists.for(current_array))
        { 
                next
        }
        regressors_mat[, colmatreg] = get.data(predictor) #Actually lot of long process
    }
    dtf = data.frame(cbind(regressors_mat, Y))  
    colnames(dtf)[ncol(dtf)] = "Y"

    LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)#won't work if the 7 predictors are not available
# Long process 
}   
# Long process 

Is there anyway to perform multi factor linear regression without having to write LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf) which doesn't work when all 7 predictors are not available and without having to save and process 40,000 x 400 x nb_of_unavailable_predictors ? 无论如何,是否有必要执行多因素线性回归而不必编写LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf) ,这在所有7个预测变量均不可用时不起作用,并且无需保存和处理40,000 x 400 x nb_of_unavailable_predictors

Something like this would be great: 像这样的东西会很棒:

for (current_array in arrays)
{
    Y = get.data(current_array)
    nbcol = nb.predictos.available(current_array) # I can have this function
    regressors_mat = matrix (0, nrow = 40000, ncol = nbcol )
    colmatreg = 0
    for (predictor in predictors)
    {
        colmatreg = colmatreg + 1       
        if (!(predictor.exists.for(current_array))
        { 
                next
        }
        regressors_mat[, colmatreg] = get.data(predictor) #Actually lot of long process
    }
    dtf = data.frame(cbind(regressors_mat, Y))  
    colnames(dtf)[ncol(dtf)] = "Y"

    LinearModel = lm(Y ~ colSums(dtf[, 1:(ncol(dtf) -1)], data = dtf)
#Allowing to make the multifactorial model without knowing in advance the number of factors
}   

Or if it is more efficient I won't even have to preallocate, i could concatenate the columns 或者,如果它更有效,我什至不必预先分配,我可以将列连接起来

Any help or advice would be great. 任何帮助或建议都很好。 Thanks! 谢谢!

I've finally found that . 我终于找到了. was the solution! 是解决方案! If dtf is a dataframe with 8 columns [Y, V1, V2, ... V7] 如果dtf是具有8列[Y,V1,V2,... V7]的数据帧

LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)

is exactly the same as: 与以下内容完全相同:

LinearModel = lm(Y ~ ., data = dtf)

. , will take all the remaining columns. ,将占用所有剩余的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在ggplot2中将R ^ 2和回归值添加到多因素设计中 - How to add R^2 and regression values to multi-factorial design in ggplot2 在 R 中使用 plm 的没有预测变量的回归模型? - Regression model without predictors using plm in R? R中具有多个虚拟编码预测变量的线性回归模型的箱线图 - Boxplot of Linear Regression Model with several Dummy coded predictors in R 在R中:如何在交互的预测变量上运行多元线性回归,而不对未交互的变量进行回归? - In R: How do I run a multiple linear regression on interacted predictors without regressing on the variables not interacted? 通过R中线性回归的两个预测变量组合 - By two combinations of predictors in linear regression in R R-如何在回归模型中旋转/互换预测变量(非逐步方法) - R - How to rotate/interchange predictors in a regression model (a not-stepwise approach) 如何在R中分离预测变量的线性组合而不是CART模型的预测变量 - How to split on linear combination of predictors instead of a predictor for CART model in R 随着我们逐步添加预测变量,获取线性回归模型的R平方值列表 - Get list of R-squared values for linear regression model as we incrementally add predictors 试图在R中建立线性回归模型 - Trying to make a linear regression model in R R中的线性回归模型 - Linear Regression Model in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM