简体   繁体   中英

Programming a linear regression R model formula for 100 features to have an interaction with one

I have a situation where I need to train a regression model that will have 100 features. I want to look for interaction effects between all 100 features and one other feature. I would like to find a way to do this programatically as well since this analysis is going to be recuring and I don't want to have to reprogram a new formula each time this analysis is run. I want it to be automated. So how can I get a model that is like so

Y~a*b + a*c + .... a*z 

But for 100 terms? How do I get the R formula to do this? Note I will be using statsmodels in python but I think the syntax is the same.

lm(Y ~ a * ., df)

eg

lm(Sepal.Width ~ Sepal.Length * ., iris)

Call:
lm(formula = Sepal.Width ~ Sepal.Length * ., data = iris)

Coefficients:
                   (Intercept)                    Sepal.Length                    Petal.Length                     Petal.Width  
                      -0.91350                         0.82954                         0.29569                         0.85334  
             Speciesversicolor                Speciesvirginica       Sepal.Length:Petal.Length        Sepal.Length:Petal.Width  
                       0.05894                        -0.89244                        -0.05394                        -0.04654  
Sepal.Length:Speciesversicolor   Sepal.Length:Speciesvirginica  
                      -0.32823                        -0.21910  

Here is an example of how to construct the wanted string and then convert to a formula

paste("a", letters[2:26], sep = "*")  |>
    paste(collapse = " + ") |>
    sprintf(fmt = "Y ~ %s") |>
    as.formula()
    
##> Y ~ a * b + a * c + a * d + a * e + a * f + a * g + a * h + a * 
##>     i + a * j + a * k + a * l + a * m + a * n + a * o + a * p + 
##>     a * q + a * r + a * s + a * t + a * u + a * v + a * w + a * 
##>     x + a * y + a * z

Solution use regex:

# this would be the columns of a dataframe
effects_list = ['regressor_col','A', 'B', 'C', 'D', 'E','F'] 
interaction = effects_list[3]
regressor = effects_list[0]
formula = regressor + ' ~'
for effect in effects_list:
    # check if it's the interaction term if it is skip it
    #print((effect != interaction) & (effect != regressor))
    if (effect != interaction) & (effect != regressor):
        formula = formula + ' + ' + effect + '*' + interaction
             
    

print(formula)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM