简体   繁体   中英

How to make a multi-factorial regression (general linear model) in R without knowing in advance the number of predictors?

I want to make a multi-factorial linear regression in R without explicitly knowing the number of predictors before .

I have about 400 arrays and i'm performing through a loop, multiple factor regression for each of them. For each of the regressions I have at most 7 predictors. My problem is in the 'at most' , some predictors don't exist for some of the arrays. In this situation, when I do something like this, it obviously won't work LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = foo) . Where foo is a data frame with 8 columns [Y, V1, V2, ... V7]

I've actually found a solution which consists in replacing by a zero vector any missing predictor. It works, but I'm constrained to keep and process many unuseful data that take memory (each of the array has about 40,000 values).

Here is what the code looks like

for (current_array in arrays)
{
    Y = get.data(current_array) #Actually lot of long process

    regressors_mat = matrix (0, nrow = 40000, ncol = 7) # All non existing indicators will stay at 0
    colmatreg = 0
    for (predictor in predictors)
    {
        colmatreg = colmatreg + 1       
        if (!(predictor.exists.for(current_array))
        { 
                next
        }
        regressors_mat[, colmatreg] = get.data(predictor) #Actually lot of long process
    }
    dtf = data.frame(cbind(regressors_mat, Y))  
    colnames(dtf)[ncol(dtf)] = "Y"

    LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)#won't work if the 7 predictors are not available
# Long process 
}   
# Long process 

Is there anyway to perform multi factor linear regression without having to write LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf) which doesn't work when all 7 predictors are not available and without having to save and process 40,000 x 400 x nb_of_unavailable_predictors ?

Something like this would be great:

for (current_array in arrays)
{
    Y = get.data(current_array)
    nbcol = nb.predictos.available(current_array) # I can have this function
    regressors_mat = matrix (0, nrow = 40000, ncol = nbcol )
    colmatreg = 0
    for (predictor in predictors)
    {
        colmatreg = colmatreg + 1       
        if (!(predictor.exists.for(current_array))
        { 
                next
        }
        regressors_mat[, colmatreg] = get.data(predictor) #Actually lot of long process
    }
    dtf = data.frame(cbind(regressors_mat, Y))  
    colnames(dtf)[ncol(dtf)] = "Y"

    LinearModel = lm(Y ~ colSums(dtf[, 1:(ncol(dtf) -1)], data = dtf)
#Allowing to make the multifactorial model without knowing in advance the number of factors
}   

Or if it is more efficient I won't even have to preallocate, i could concatenate the columns

Any help or advice would be great. Thanks!

I've finally found that . was the solution! If dtf is a dataframe with 8 columns [Y, V1, V2, ... V7]

LinearModel = lm(Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7, data = dtf)

is exactly the same as:

LinearModel = lm(Y ~ ., data = dtf)

. , will take all the remaining columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM