[英]Linear Regression Loops in R
I need beta coefficients and residual variance for multiple stock.我需要多个股票的 beta 系数和剩余方差。 My question is, how can I create a loop for multiple linear regression and extract the aforementioned coefficients into the output?我的问题是,如何为多元线性回归创建一个循环并将上述系数提取到 output 中?
Here is what my data looks like, MR is my independent variable and rest of the columns are dependent variables, to each of which I have to perform a linear regression separately.这是我的数据的样子,MR 是我的自变量,列的 rest 是因变量,我必须分别对每个变量执行线性回归。
Thank you very much!非常感谢!
//Edit: //编辑:
> dput(head(Beta_market_model_test))
structure(list(...1 = structure(c(1422748800, 1425168000, 1427846400,
1430438400, 1433116800, 1435708800), tzone = "UTC", class = c("POSIXct",
"POSIXt")), R1 = c(-0.0225553678146582, 0.084773882172773, -0.00628335525823254,
0.189767902403849, -0.129765571642446, -0.02268699227135), R2 = c(-0.000634819869861802,
0.0566396021070485, 0.0504313735522286, -0.0275926732076482,
0.0473125483284236, -0.0501700832780339), R3 = c(-0.0607564272876455,
0.0915928283206455, -0.116429377153136, 0.0338313435925748, -0.0731748018356279,
-0.082292041771696), R4 = c(0.036716647443291, 0.0409790469126645,
-0.0594941218382615, 0.0477272727272728, 0.0115690527838033,
-0.0187634024303074), R5 = c(0.00286365940192601, 0.0128875748616479,
0.000174637626924046, 0.0238214018458469, 0.0120599342185406,
-0.0627587867116033), R6 = c(-0.0944601447872712, 0.090838356632893,
-0.0577132600192821, 0.136928528648433, -0.0137770071043408,
0.0214549609033041), MR = c(-0.0388483879770769, 0.0858362570727453,
-0.0178553084990147, 0.0567646974926548, -0.0391124787432181,
-0.014626289866472)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
We could use cbind
to specify the dependent variables in lm
我们可以使用cbind
来指定lm
中的因变量
model <- lm(cbind(R1, R2, R3, R4, R5, R6) ~ MR, data = df1)
s1 <- summary(model)
NOTE: We assume that the 'R1' to 'R6' are numeric columns ie the ,
should be replaced with .
注意:我们假设 'R1' 到 'R6' 是数字列,即,
应替换为.
while reading into R
在读入R
If there are many columns and are in the range of sequece, extract those columns and convert to matrix
如果有很多列并且在序列范围内,则提取这些列并转换为matrix
dep_data <- as.matrix(Beta_market_model_test[startsWith(
names(Beta_market_model_test), "R")])
model <- lm(dep_data ~ MR, data = Beta_market_model_test)
Checking the summary
检查summary
summary(model)
Response R1 :
Call:
lm(formula = R1 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.03757 -0.06851 0.01791 0.08624 -0.06919 -0.00402
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.006368 0.028060 0.227 0.8316
MR 1.711625 0.577571 2.963 0.0414 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.06831 on 4 degrees of freedom
Multiple R-squared: 0.6871, Adjusted R-squared: 0.6088
F-statistic: 8.782 on 1 and 4 DF, p-value: 0.04141
Response R2 :
Call:
lm(formula = R2 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
-0.01047 0.03882 0.03925 -0.04355 0.03750 -0.06155
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.01232 0.02079 0.593 0.585
MR 0.06402 0.42797 0.150 0.888
Residual standard error: 0.05062 on 4 degrees of freedom
Multiple R-squared: 0.005564, Adjusted R-squared: -0.243
F-statistic: 0.02238 on 1 and 4 DF, p-value: 0.8883
Response R3 :
Call:
lm(formula = R3 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.035081 0.014541 -0.049701 -0.002909 0.023029 -0.020041
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.04197 0.01431 -2.934 0.04266 *
MR 1.38661 0.29449 4.709 0.00925 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03483 on 4 degrees of freedom
Multiple R-squared: 0.8472, Adjusted R-squared: 0.8089
F-statistic: 22.17 on 1 and 4 DF, p-value: 0.009249
Response R4 :
Call:
lm(formula = R4 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.0438966 0.0002996 -0.0603723 0.0182067 0.0188503 -0.0208810
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.007732 0.016804 0.46 0.669
MR 0.383843 0.345886 1.11 0.329
Residual standard error: 0.04091 on 4 degrees of freedom
Multiple R-squared: 0.2354, Adjusted R-squared: 0.04425
F-statistic: 1.232 on 1 and 4 DF, p-value: 0.3293
Response R5 :
Call:
lm(formula = R5 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.013692 -0.001676 0.006728 0.015178 0.022942 -0.056863
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.002917 0.013351 -0.218 0.838
MR 0.203653 0.274801 0.741 0.500
Residual standard error: 0.0325 on 4 degrees of freedom
Multiple R-squared: 0.1207, Adjusted R-squared: -0.09909
F-statistic: 0.5492 on 1 and 4 DF, p-value: 0.4998
Response R6 :
Call:
lm(formula = R6 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
-0.04498 -0.03837 -0.03832 0.04938 0.03608 0.03622
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.006197 0.020555 0.302 0.7781
MR 1.433135 0.423083 3.387 0.0276 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.05004 on 4 degrees of freedom
Multiple R-squared: 0.7415, Adjusted R-squared: 0.6769
F-statistic: 11.47 on 1 and 4 DF, p-value: 0.0276
We could get the summary output in a data.frame easily in a tabular format with tidy
from broom
我们可以很容易地以表格格式在tidy
中获得摘要broom
library(purrr)
library(broom)
map_dfr(summary(model), tidy, .id = 'dep_var')
# A tibble: 12 x 6
# dep_var term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Response R1 (Intercept) 0.00637 0.0281 0.227 0.832
# 2 Response R1 MR 1.71 0.578 2.96 0.0414
# 3 Response R2 (Intercept) 0.0123 0.0208 0.593 0.585
# 4 Response R2 MR 0.0640 0.428 0.150 0.888
# 5 Response R3 (Intercept) -0.0420 0.0143 -2.93 0.0427
# 6 Response R3 MR 1.39 0.294 4.71 0.00925
# 7 Response R4 (Intercept) 0.00773 0.0168 0.460 0.669
# 8 Response R4 MR 0.384 0.346 1.11 0.329
# 9 Response R5 (Intercept) -0.00292 0.0134 -0.218 0.838
#10 Response R5 MR 0.204 0.275 0.741 0.500
#11 Response R6 (Intercept) 0.00620 0.0206 0.302 0.778
#12 Response R6 MR 1.43 0.423 3.39 0.0276
Or to get other output with glance
或glance
地获取其他output
map_dfr(summary(model), glance, .id = 'dep_var')
I'm just posting this to ask a question about my code:我只是发布这个来询问有关我的代码的问题:
library(dplyr)
library(tidyr)
library(broom)
df %>%
select(-...1) %>%
pivot_longer(R1:R6) %>%
group_by(name) %>%
nest(data = c(MR, value)) %>%
mutate(model = map(data, ~ lm(MR ~ value, data = .)),
glance = map(model, ~ glance(.x))) %>%
unnest(glance) %>%
select(- c(data, model))
# A tibble: 6 x 13
# Groups: name [6]
name r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 R1 0.687 0.609 0.0331 8.78 0.0414 1 13.2 -20.3 -20.9 0.00438
2 R2 0.00556 -0.243 0.0590 0.0224 0.888 1 9.69 -13.4 -14.0 0.0139
3 R3 0.847 0.809 0.0231 22.2 0.00925 1 15.3 -24.6 -25.2 0.00214
4 R4 0.235 0.0443 0.0517 1.23 0.329 1 10.5 -15.0 -15.6 0.0107
5 R5 0.121 -0.0991 0.0555 0.549 0.500 1 10.1 -14.1 -14.7 0.0123
6 R6 0.742 0.677 0.0301 11.5 0.0276 1 13.7 -21.5 -22.1 0.00362
# ... with 2 more variables: df.residual <int>, nobs <int>
Update更新
Thanks to my dear friend @akrun who always provides me with valuable suggestions.感谢我亲爱的朋友@akrun,他总是为我提供宝贵的建议。
In case you would like to avoid pivoting the data as with a really big data the pivoting could increase the rows to a degree that it would exceed the limitations, you can use the following code as well:如果您想避免像处理非常大的数据那样旋转数据,旋转可能会将行数增加到超出限制的程度,您也可以使用以下代码:
library(dplyr)
library(tidyr)
library(broom)
df %>%
select(-1) %>%
summarise(across(-MR, ~ list(lm(reformulate('MR', response = cur_column()),
data = df) %>%
summary))) %>%
unclass %>%
map_dfr(~ tidy(.x[[1]]))
# A tibble: 12 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.00637 0.0281 0.227 0.832
2 MR 1.71 0.578 2.96 0.0414
3 (Intercept) 0.0123 0.0208 0.593 0.585
4 MR 0.0640 0.428 0.150 0.888
5 (Intercept) -0.0420 0.0143 -2.93 0.0427
6 MR 1.39 0.294 4.71 0.00925
7 (Intercept) 0.00773 0.0168 0.460 0.669
8 MR 0.384 0.346 1.11 0.329
9 (Intercept) -0.00292 0.0134 -0.218 0.838
10 MR 0.204 0.275 0.741 0.500
11 (Intercept) 0.00620 0.0206 0.302 0.778
12 MR 1.43 0.423 3.39 0.0276
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.