繁体   English   中英

如何对所有不同解释变量组合在时间序列数据上的R,Excel / VBA中运行不同的多元线性回归?

[英]How to run different multiple linear regressions in R, Excel/VBA on a time series data for all different combinations of Explanatory Variables?

我是编码和R的新手,希望获得您的帮助。 为了进行分析,我试图对具有1个因变量(Y)和4个自变量(X1,X2,X3,X4)的时间序列数据进行回归分析。 所有这些变量(Y和X)具有4个不同的转换(例如,对于X1-X1,SQRT(X1),Square(X1)和Ln(X1))。 我想对Y的所有可能组合(Y,SQRT(Y),Square(Y),Ln(Y))和所有X值组合进行回归分析,以便最终我可以通过查看R的平方值,在哪个变量中选择哪个变量。

我目前正在使用R中的代码进行线性回归并手动更改变量,这需要很多时间。 也许有一个循环或者我可以用于回归的东西? 等待您的帮助。 谢谢

lm(Y ~ X1 + X2 + X3 + X4)
lm(SQRT(Y) ~ X1 + X2 + X3 + X4)
lm(Square(Y) ~ X1 + X2 + X3 + X4)
lm(Ln(Y) ~ 1 + X2 + X3 + X4)

lm(Y ~ SQRT(X1) + X2 + X3 + X4)
lm(Y ~ Square(X1) + X2 + X3 + X4)
.... 
lm(ln(Y)~ ln(X1) + ln(X2) + ln(X3) + ln(X4))

这是我的原始代码。

Regression10 <- lm(Final_Data_v2$`10 KW Installations (MW)`~Final_Data_v2$`10 KW Prio Installations (MW)`+Final_Data_v2$`FiT 10 KW (Cent/kWh)`+Final_Data_v2$`Electricity Prices 10 kW Cent/kW`+Final_Data_v2$`PV System Price  (Eur/W)`)
summary(Regression10)
Regressionsqrt10 <- lm(Final_Data_v2$`SQRT(10 KW Installations (MW))`~Final_Data_v2$`10 KW Prio Installations (MW)`+Final_Data_v2$`FiT 10 KW (Cent/kWh)`+Final_Data_v2$`Electricity Prices 10 kW Cent/kW`+Final_Data_v2$`PV System Price  (Eur/W)`)
summary(Regressionsqrt10) 

等等..

这是指向我的数据的链接: LINK

这将选择RHS变量的转换,以使调整后的R平方最大化。 不过,这种统计方法几乎肯定会导致虚假结果。

# simulate some data
set.seed(0)
df <- data.frame(Y = runif(100),
                 X1 = runif(100),
                 X2 = runif(100),
                 X3 = runif(100),
                 X4 = runif(100))

# create new variables for log/sqrt transormations of every X and Y
for(x in names(df)){
    df[[paste0(x, "_log")]] <- log(df[[x]])
    df[[paste0(x, "_sqrt")]] <- sqrt(df[[x]])}

# all combinations of Y and X's
yVars <- names(df)[substr(names(df),1,1)=='Y']
xVars <- names(df)[substr(names(df),1,1)=='X']
df2 <- combn(c(yVars, xVars), 5) %>% data.frame()

# Ensure that formula is in form of some Y, some X1, some X2...
valid <- function(x){
    ifelse(grepl("Y", x[1]) &
           grepl("X1", x[2]) &
           grepl("X2", x[3]) &
           grepl("X3", x[4]) &
           grepl("X4", x[5]), T, F)}

df2 <- df2[, sapply(df2, valid)]

# Create the formulas
formulas <- sapply(names(df2), function(x){
    paste0(df2[[x]][1], " ~ ",
           df2[[x]][2], " + ",
           df2[[x]][3], " + ",
           df2[[x]][4], " + ",
           df2[[x]][5])}) 

# Run linear model for each formula
models <- lapply(formulas, function(x) summary(lm(as.formula(x), data=df)))

# Return the formula that maximizes R-squared
formulas[which.max(sapply(models, function(x) x[['adj.r.squared']]))]

"Y ~ X1 + X2 + X3 + X4_log" 

考虑所有系数组合的expand.grid ,使用grep过滤每个列名称。 然后调用模型函数,该模型函数使用带有Map的动态公式(将wrapper封装为mapply )来构建N个= 1,024项的lm对象(等于系数的所有组合)的列表。

下面运行平方根和平方的等效多项式运算。 注意: grep仅是对实际变量名称的必要调整。

coeffs <- c(names(Final_Data_v2),
            paste0("I(", names(Final_Data_v2), "^(1/2))"),
            paste0("I(", names(Final_Data_v2), "^2)"),
            paste0("log(", names(Final_Data_v2), ")"))         

# BUILD DATA FRAME OF ALL COMBNS OF VARIABLE AND TRANSFORMATION TYPES
all_combns <- expand.grid(y_var = coeffs[grep("10 KW Installations (MW)", coeffs)],
                          x_var1 = coeffs[grep("10 KW Prio Installations (MW)", coeffs)],
                          x_var2 = coeffs[grep("FiT 10 KW (Cent/kWh)", coeffs)],
                          x_var3 = coeffs[grep("Electricity Prices 10 kW Cent/kW", coeffs)],
                          x_var4 = coeffs[grep("PV System Price  (Eur/W)", coeffs)],
                          stringsAsFactors = FALSE)

# FUNCTION WITH DYNAMIC FORMULA TO RECEIVE ALL POLYNOMIAL TYPES
proc_model <- function(y, x1, x2, x3, x4) {
     myformula <- paste0("`",y,"`~`",x1,"`+`",x2,"`+`",x3,"`+`",x4,"`")
     summary(lm(as.formula(myformula), data=Final_Data_v2))
}

# MAP CALL PASSING COLUMN VALUES ELEMENTWISE AS FUNCTION PARAMS
lm_list <- with(all_combns, Map(proc_model, y_var, x_var1, x_var2, x_var3, x_var4))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM