简体   繁体   English

如何对所有不同解释变量组合在时间序列数据上的R,Excel / VBA中运行不同的多元线性回归?

[英]How to run different multiple linear regressions in R, Excel/VBA on a time series data for all different combinations of Explanatory Variables?

I am new to coding and R and would like your help. 我是编码和R的新手,希望获得您的帮助。 For my analysis, I am trying to run regression on a time series data with 1 dependent variable (Y) and 4 Independent Variables (X1, X2, X3, X4). 为了进行分析,我试图对具有1个因变量(Y)和4个自变量(X1,X2,X3,X4)的时间序列数据进行回归分析。 All these variables (Y and X) have 4 different transformations (For example for X1 - X1, SQRT(X1), Square(X1) and Ln(X1)). 所有这些变量(Y和X)具有4个不同的转换(例如,对于X1-X1,SQRT(X1),Square(X1)和Ln(X1))。 I want to run the regressions for all the possible combinations of Y (Y, SQRT(Y), Square(Y), Ln(Y)) and all the combinations of X values so that in the end I can decide by looking at the R squared value which variable to choose in which of its transformation. 我想对Y的所有可能组合(Y,SQRT(Y),Square(Y),Ln(Y))和所有X值组合进行回归分析,以便最终我可以通过查看R的平方值,在哪个变量中选择哪个变量。

I am currently using the code in R for linear regression and changing the variables manually which is taking a lot of time. 我目前正在使用R中的代码进行线性回归并手动更改变量,这需要很多时间。 Maybe there is a loop or something I can use for the regressions? 也许有一个循环或者我可以用于回归的东西? Waiting for your kind help. 等待您的帮助。 Thanks 谢谢

lm(Y ~ X1 + X2 + X3 + X4)
lm(SQRT(Y) ~ X1 + X2 + X3 + X4)
lm(Square(Y) ~ X1 + X2 + X3 + X4)
lm(Ln(Y) ~ 1 + X2 + X3 + X4)

lm(Y ~ SQRT(X1) + X2 + X3 + X4)
lm(Y ~ Square(X1) + X2 + X3 + X4)
.... 
lm(ln(Y)~ ln(X1) + ln(X2) + ln(X3) + ln(X4))

This is my original code. 这是我的原始代码。

Regression10 <- lm(Final_Data_v2$`10 KW Installations (MW)`~Final_Data_v2$`10 KW Prio Installations (MW)`+Final_Data_v2$`FiT 10 KW (Cent/kWh)`+Final_Data_v2$`Electricity Prices 10 kW Cent/kW`+Final_Data_v2$`PV System Price  (Eur/W)`)
summary(Regression10)
Regressionsqrt10 <- lm(Final_Data_v2$`SQRT(10 KW Installations (MW))`~Final_Data_v2$`10 KW Prio Installations (MW)`+Final_Data_v2$`FiT 10 KW (Cent/kWh)`+Final_Data_v2$`Electricity Prices 10 kW Cent/kW`+Final_Data_v2$`PV System Price  (Eur/W)`)
summary(Regressionsqrt10) 

And so on.. 等等..

Here is the link to my DATA: LINK 这是指向我的数据的链接: LINK

This picks the transformations of RHS variables such that adjusted R-squared is maximized. 这将选择RHS变量的转换,以使调整后的R平方最大化。 This statistical approach will almost certainly lead to spurious results though. 不过,这种统计方法几乎肯定会导致虚假结果。

# simulate some data
set.seed(0)
df <- data.frame(Y = runif(100),
                 X1 = runif(100),
                 X2 = runif(100),
                 X3 = runif(100),
                 X4 = runif(100))

# create new variables for log/sqrt transormations of every X and Y
for(x in names(df)){
    df[[paste0(x, "_log")]] <- log(df[[x]])
    df[[paste0(x, "_sqrt")]] <- sqrt(df[[x]])}

# all combinations of Y and X's
yVars <- names(df)[substr(names(df),1,1)=='Y']
xVars <- names(df)[substr(names(df),1,1)=='X']
df2 <- combn(c(yVars, xVars), 5) %>% data.frame()

# Ensure that formula is in form of some Y, some X1, some X2...
valid <- function(x){
    ifelse(grepl("Y", x[1]) &
           grepl("X1", x[2]) &
           grepl("X2", x[3]) &
           grepl("X3", x[4]) &
           grepl("X4", x[5]), T, F)}

df2 <- df2[, sapply(df2, valid)]

# Create the formulas
formulas <- sapply(names(df2), function(x){
    paste0(df2[[x]][1], " ~ ",
           df2[[x]][2], " + ",
           df2[[x]][3], " + ",
           df2[[x]][4], " + ",
           df2[[x]][5])}) 

# Run linear model for each formula
models <- lapply(formulas, function(x) summary(lm(as.formula(x), data=df)))

# Return the formula that maximizes R-squared
formulas[which.max(sapply(models, function(x) x[['adj.r.squared']]))]

"Y ~ X1 + X2 + X3 + X4_log" 

Consider expand.grid for all combinations of coefficients, filtering on each column name using grep . 考虑所有系数组合的expand.grid ,使用grep过滤每个列名称。 Then call model function that takes a dynamic formula with Map (wrapper to mapply ) to build list of lm objects (equal to all combinations of coefficients) at N=1,024 items. 然后调用模型函数,该模型函数使用带有Map的动态公式(将wrapper封装为mapply )来构建N个= 1,024项的lm对象(等于系数的所有组合)的列表。

Below runs the equivalent polynomial operations for square root and squared. 下面运行平方根和平方的等效多项式运算。 Note: grep is only adjustment required to actual variable names. 注意: grep仅是对实际变量名称的必要调整。

coeffs <- c(names(Final_Data_v2),
            paste0("I(", names(Final_Data_v2), "^(1/2))"),
            paste0("I(", names(Final_Data_v2), "^2)"),
            paste0("log(", names(Final_Data_v2), ")"))         

# BUILD DATA FRAME OF ALL COMBNS OF VARIABLE AND TRANSFORMATION TYPES
all_combns <- expand.grid(y_var = coeffs[grep("10 KW Installations (MW)", coeffs)],
                          x_var1 = coeffs[grep("10 KW Prio Installations (MW)", coeffs)],
                          x_var2 = coeffs[grep("FiT 10 KW (Cent/kWh)", coeffs)],
                          x_var3 = coeffs[grep("Electricity Prices 10 kW Cent/kW", coeffs)],
                          x_var4 = coeffs[grep("PV System Price  (Eur/W)", coeffs)],
                          stringsAsFactors = FALSE)

# FUNCTION WITH DYNAMIC FORMULA TO RECEIVE ALL POLYNOMIAL TYPES
proc_model <- function(y, x1, x2, x3, x4) {
     myformula <- paste0("`",y,"`~`",x1,"`+`",x2,"`+`",x3,"`+`",x4,"`")
     summary(lm(as.formula(myformula), data=Final_Data_v2))
}

# MAP CALL PASSING COLUMN VALUES ELEMENTWISE AS FUNCTION PARAMS
lm_list <- with(all_combns, Map(proc_model, y_var, x_var1, x_var2, x_var3, x_var4))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何运行具有不同自变量和因变量的多元线性回归,并在 R 中添加标准化系数? - How to run multiple linear regressions with different independent variables and dependent variables adding standardized coefficients in R? 在不同系列的线性模型中循环多个变量组合的问题 - Problems with looping multiple combinations of variables in linear models for different series R中具有不同数据帧变量组合的线性模型 - Linear models in R with different combinations of data frame variables 使用“ for循环”在R中运行一系列线性回归 - Using 'for loops' to run a series of linear regressions in R R中的线性模型具有不同的变量组合 - Linear models in R with different combinations of variables 如何在R中的多元线性回归模型中运行所有可能的组合 - How to run all possible combinations in multiple linear regression model in R 如何在R中同时从多个线性回归中获取数据? - How to fetch data from several linear regressions at the same time in R? 在R中如何在不同数据帧中长度不相等的两个变量之间运行相关或简单线性回归 - In R how to run Correlation or simple linear Regression between two variables of unequal lengths from different data frames 对不同自变量的集合进行多元回归 - Multiple regressions on sets of different independent variables 在R上对一组不同的解释变量进行并行化面板logit计算 - Parallelise panel logit computations in R on a set of different explanatory variables
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM