简体   繁体   中英

How to run different multiple linear regressions in R, Excel/VBA on a time series data for all different combinations of Explanatory Variables?

I am new to coding and R and would like your help. For my analysis, I am trying to run regression on a time series data with 1 dependent variable (Y) and 4 Independent Variables (X1, X2, X3, X4). All these variables (Y and X) have 4 different transformations (For example for X1 - X1, SQRT(X1), Square(X1) and Ln(X1)). I want to run the regressions for all the possible combinations of Y (Y, SQRT(Y), Square(Y), Ln(Y)) and all the combinations of X values so that in the end I can decide by looking at the R squared value which variable to choose in which of its transformation.

I am currently using the code in R for linear regression and changing the variables manually which is taking a lot of time. Maybe there is a loop or something I can use for the regressions? Waiting for your kind help. Thanks

lm(Y ~ X1 + X2 + X3 + X4)
lm(SQRT(Y) ~ X1 + X2 + X3 + X4)
lm(Square(Y) ~ X1 + X2 + X3 + X4)
lm(Ln(Y) ~ 1 + X2 + X3 + X4)

lm(Y ~ SQRT(X1) + X2 + X3 + X4)
lm(Y ~ Square(X1) + X2 + X3 + X4)
.... 
lm(ln(Y)~ ln(X1) + ln(X2) + ln(X3) + ln(X4))

This is my original code.

Regression10 <- lm(Final_Data_v2$`10 KW Installations (MW)`~Final_Data_v2$`10 KW Prio Installations (MW)`+Final_Data_v2$`FiT 10 KW (Cent/kWh)`+Final_Data_v2$`Electricity Prices 10 kW Cent/kW`+Final_Data_v2$`PV System Price  (Eur/W)`)
summary(Regression10)
Regressionsqrt10 <- lm(Final_Data_v2$`SQRT(10 KW Installations (MW))`~Final_Data_v2$`10 KW Prio Installations (MW)`+Final_Data_v2$`FiT 10 KW (Cent/kWh)`+Final_Data_v2$`Electricity Prices 10 kW Cent/kW`+Final_Data_v2$`PV System Price  (Eur/W)`)
summary(Regressionsqrt10) 

And so on..

Here is the link to my DATA: LINK

This picks the transformations of RHS variables such that adjusted R-squared is maximized. This statistical approach will almost certainly lead to spurious results though.

# simulate some data
set.seed(0)
df <- data.frame(Y = runif(100),
                 X1 = runif(100),
                 X2 = runif(100),
                 X3 = runif(100),
                 X4 = runif(100))

# create new variables for log/sqrt transormations of every X and Y
for(x in names(df)){
    df[[paste0(x, "_log")]] <- log(df[[x]])
    df[[paste0(x, "_sqrt")]] <- sqrt(df[[x]])}

# all combinations of Y and X's
yVars <- names(df)[substr(names(df),1,1)=='Y']
xVars <- names(df)[substr(names(df),1,1)=='X']
df2 <- combn(c(yVars, xVars), 5) %>% data.frame()

# Ensure that formula is in form of some Y, some X1, some X2...
valid <- function(x){
    ifelse(grepl("Y", x[1]) &
           grepl("X1", x[2]) &
           grepl("X2", x[3]) &
           grepl("X3", x[4]) &
           grepl("X4", x[5]), T, F)}

df2 <- df2[, sapply(df2, valid)]

# Create the formulas
formulas <- sapply(names(df2), function(x){
    paste0(df2[[x]][1], " ~ ",
           df2[[x]][2], " + ",
           df2[[x]][3], " + ",
           df2[[x]][4], " + ",
           df2[[x]][5])}) 

# Run linear model for each formula
models <- lapply(formulas, function(x) summary(lm(as.formula(x), data=df)))

# Return the formula that maximizes R-squared
formulas[which.max(sapply(models, function(x) x[['adj.r.squared']]))]

"Y ~ X1 + X2 + X3 + X4_log" 

Consider expand.grid for all combinations of coefficients, filtering on each column name using grep . Then call model function that takes a dynamic formula with Map (wrapper to mapply ) to build list of lm objects (equal to all combinations of coefficients) at N=1,024 items.

Below runs the equivalent polynomial operations for square root and squared. Note: grep is only adjustment required to actual variable names.

coeffs <- c(names(Final_Data_v2),
            paste0("I(", names(Final_Data_v2), "^(1/2))"),
            paste0("I(", names(Final_Data_v2), "^2)"),
            paste0("log(", names(Final_Data_v2), ")"))         

# BUILD DATA FRAME OF ALL COMBNS OF VARIABLE AND TRANSFORMATION TYPES
all_combns <- expand.grid(y_var = coeffs[grep("10 KW Installations (MW)", coeffs)],
                          x_var1 = coeffs[grep("10 KW Prio Installations (MW)", coeffs)],
                          x_var2 = coeffs[grep("FiT 10 KW (Cent/kWh)", coeffs)],
                          x_var3 = coeffs[grep("Electricity Prices 10 kW Cent/kW", coeffs)],
                          x_var4 = coeffs[grep("PV System Price  (Eur/W)", coeffs)],
                          stringsAsFactors = FALSE)

# FUNCTION WITH DYNAMIC FORMULA TO RECEIVE ALL POLYNOMIAL TYPES
proc_model <- function(y, x1, x2, x3, x4) {
     myformula <- paste0("`",y,"`~`",x1,"`+`",x2,"`+`",x3,"`+`",x4,"`")
     summary(lm(as.formula(myformula), data=Final_Data_v2))
}

# MAP CALL PASSING COLUMN VALUES ELEMENTWISE AS FUNCTION PARAMS
lm_list <- with(all_combns, Map(proc_model, y_var, x_var1, x_var2, x_var3, x_var4))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM