简体   繁体   English

R 中的线性回归循环

[英]Linear Regression Loops in R

I need beta coefficients and residual variance for multiple stock.我需要多个股票的 beta 系数和剩余方差。 My question is, how can I create a loop for multiple linear regression and extract the aforementioned coefficients into the output?我的问题是,如何为多元线性回归创建一个循环并将上述系数提取到 output 中?

Here is what my data looks like, MR is my independent variable and rest of the columns are dependent variables, to each of which I have to perform a linear regression separately.这是我的数据的样子,MR 是我的自变量,列的 rest 是因变量,我必须分别对每个变量执行线性回归。

数据集

Thank you very much!非常感谢!

//Edit: //编辑:

> dput(head(Beta_market_model_test))
structure(list(...1 = structure(c(1422748800, 1425168000, 1427846400, 
1430438400, 1433116800, 1435708800), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), R1 = c(-0.0225553678146582, 0.084773882172773, -0.00628335525823254, 
0.189767902403849, -0.129765571642446, -0.02268699227135), R2 = c(-0.000634819869861802, 
0.0566396021070485, 0.0504313735522286, -0.0275926732076482, 
0.0473125483284236, -0.0501700832780339), R3 = c(-0.0607564272876455, 
0.0915928283206455, -0.116429377153136, 0.0338313435925748, -0.0731748018356279, 
-0.082292041771696), R4 = c(0.036716647443291, 0.0409790469126645, 
-0.0594941218382615, 0.0477272727272728, 0.0115690527838033, 
-0.0187634024303074), R5 = c(0.00286365940192601, 0.0128875748616479, 
0.000174637626924046, 0.0238214018458469, 0.0120599342185406, 
-0.0627587867116033), R6 = c(-0.0944601447872712, 0.090838356632893, 
-0.0577132600192821, 0.136928528648433, -0.0137770071043408, 
0.0214549609033041), MR = c(-0.0388483879770769, 0.0858362570727453, 
-0.0178553084990147, 0.0567646974926548, -0.0391124787432181, 
-0.014626289866472)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

We could use cbind to specify the dependent variables in lm我们可以使用cbind来指定lm中的因变量

model <- lm(cbind(R1, R2, R3, R4, R5, R6) ~ MR, data = df1)
s1 <- summary(model)

NOTE: We assume that the 'R1' to 'R6' are numeric columns ie the , should be replaced with .注意:我们假设 'R1' 到 'R6' 是数字列,即,应替换为. while reading into R在读入R

Update更新

If there are many columns and are in the range of sequece, extract those columns and convert to matrix如果有很多列并且在序列范围内,则提取这些列并转换为matrix

dep_data <- as.matrix(Beta_market_model_test[startsWith(
                 names(Beta_market_model_test), "R")])
model <- lm(dep_data ~ MR, data = Beta_market_model_test)

Checking the summary检查summary

summary(model)
Response R1 :

Call:
lm(formula = R1 ~ MR, data = Beta_market_model_test)

Residuals:
       1        2        3        4        5        6 
 0.03757 -0.06851  0.01791  0.08624 -0.06919 -0.00402 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 0.006368   0.028060   0.227   0.8316  
MR          1.711625   0.577571   2.963   0.0414 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06831 on 4 degrees of freedom
Multiple R-squared:  0.6871,    Adjusted R-squared:  0.6088 
F-statistic: 8.782 on 1 and 4 DF,  p-value: 0.04141


Response R2 :

Call:
lm(formula = R2 ~ MR, data = Beta_market_model_test)

Residuals:
       1        2        3        4        5        6 
-0.01047  0.03882  0.03925 -0.04355  0.03750 -0.06155 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.01232    0.02079   0.593    0.585
MR           0.06402    0.42797   0.150    0.888

Residual standard error: 0.05062 on 4 degrees of freedom
Multiple R-squared:  0.005564,  Adjusted R-squared:  -0.243 
F-statistic: 0.02238 on 1 and 4 DF,  p-value: 0.8883


Response R3 :

Call:
lm(formula = R3 ~ MR, data = Beta_market_model_test)

Residuals:
        1         2         3         4         5         6 
 0.035081  0.014541 -0.049701 -0.002909  0.023029 -0.020041 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -0.04197    0.01431  -2.934  0.04266 * 
MR           1.38661    0.29449   4.709  0.00925 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03483 on 4 degrees of freedom
Multiple R-squared:  0.8472,    Adjusted R-squared:  0.8089 
F-statistic: 22.17 on 1 and 4 DF,  p-value: 0.009249


Response R4 :

Call:
lm(formula = R4 ~ MR, data = Beta_market_model_test)

Residuals:
         1          2          3          4          5          6 
 0.0438966  0.0002996 -0.0603723  0.0182067  0.0188503 -0.0208810 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.007732   0.016804    0.46    0.669
MR          0.383843   0.345886    1.11    0.329

Residual standard error: 0.04091 on 4 degrees of freedom
Multiple R-squared:  0.2354,    Adjusted R-squared:  0.04425 
F-statistic: 1.232 on 1 and 4 DF,  p-value: 0.3293


Response R5 :

Call:
lm(formula = R5 ~ MR, data = Beta_market_model_test)

Residuals:
        1         2         3         4         5         6 
 0.013692 -0.001676  0.006728  0.015178  0.022942 -0.056863 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.002917   0.013351  -0.218    0.838
MR           0.203653   0.274801   0.741    0.500

Residual standard error: 0.0325 on 4 degrees of freedom
Multiple R-squared:  0.1207,    Adjusted R-squared:  -0.09909 
F-statistic: 0.5492 on 1 and 4 DF,  p-value: 0.4998


Response R6 :

Call:
lm(formula = R6 ~ MR, data = Beta_market_model_test)

Residuals:
       1        2        3        4        5        6 
-0.04498 -0.03837 -0.03832  0.04938  0.03608  0.03622 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 0.006197   0.020555   0.302   0.7781  
MR          1.433135   0.423083   3.387   0.0276 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.05004 on 4 degrees of freedom
Multiple R-squared:  0.7415,    Adjusted R-squared:  0.6769 
F-statistic: 11.47 on 1 and 4 DF,  p-value: 0.0276

We could get the summary output in a data.frame easily in a tabular format with tidy from broom我们可以很容易地以表格格式在tidy中获得摘要broom

library(purrr)
library(broom)
map_dfr(summary(model), tidy, .id = 'dep_var')
# A tibble: 12 x 6
#   dep_var   term        estimate std.error statistic p.value
#   <chr>       <chr>          <dbl>     <dbl>     <dbl>   <dbl>
# 1 Response R1 (Intercept)  0.00637    0.0281     0.227 0.832  
# 2 Response R1 MR           1.71       0.578      2.96  0.0414 
# 3 Response R2 (Intercept)  0.0123     0.0208     0.593 0.585  
# 4 Response R2 MR           0.0640     0.428      0.150 0.888  
# 5 Response R3 (Intercept) -0.0420     0.0143    -2.93  0.0427 
# 6 Response R3 MR           1.39       0.294      4.71  0.00925
# 7 Response R4 (Intercept)  0.00773    0.0168     0.460 0.669  
# 8 Response R4 MR           0.384      0.346      1.11  0.329  
# 9 Response R5 (Intercept) -0.00292    0.0134    -0.218 0.838  
#10 Response R5 MR           0.204      0.275      0.741 0.500  
#11 Response R6 (Intercept)  0.00620    0.0206     0.302 0.778  
#12 Response R6 MR           1.43       0.423      3.39  0.0276 

Or to get other output with glanceglance地获取其他output

map_dfr(summary(model), glance, .id = 'dep_var')

I'm just posting this to ask a question about my code:我只是发布这个来询问有关我的代码的问题:

library(dplyr)
library(tidyr)
library(broom)

df %>%
  select(-...1) %>%
  pivot_longer(R1:R6) %>%
  group_by(name) %>%
  nest(data = c(MR, value)) %>%
  mutate(model = map(data, ~ lm(MR ~ value, data = .)), 
         glance = map(model, ~ glance(.x))) %>%
  unnest(glance) %>% 
  select(- c(data, model))

# A tibble: 6 x 13
# Groups:   name [6]
  name  r.squared adj.r.squared  sigma statistic p.value    df logLik   AIC   BIC deviance
  <chr>     <dbl>         <dbl>  <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>
1 R1      0.687          0.609  0.0331    8.78   0.0414      1  13.2  -20.3 -20.9  0.00438
2 R2      0.00556       -0.243  0.0590    0.0224 0.888       1   9.69 -13.4 -14.0  0.0139 
3 R3      0.847          0.809  0.0231   22.2    0.00925     1  15.3  -24.6 -25.2  0.00214
4 R4      0.235          0.0443 0.0517    1.23   0.329       1  10.5  -15.0 -15.6  0.0107 
5 R5      0.121         -0.0991 0.0555    0.549  0.500       1  10.1  -14.1 -14.7  0.0123 
6 R6      0.742          0.677  0.0301   11.5    0.0276      1  13.7  -21.5 -22.1  0.00362
# ... with 2 more variables: df.residual <int>, nobs <int>

Update更新

Thanks to my dear friend @akrun who always provides me with valuable suggestions.感谢我亲爱的朋友@akrun,他总是为我提供宝贵的建议。

In case you would like to avoid pivoting the data as with a really big data the pivoting could increase the rows to a degree that it would exceed the limitations, you can use the following code as well:如果您想避免像处理非常大的数据那样旋转数据,旋转可能会将行数增加到超出限制的程度,您也可以使用以下代码:

library(dplyr)
library(tidyr)
library(broom)

df %>% 
  select(-1) %>% 
  summarise(across(-MR, ~ list(lm(reformulate('MR', response = cur_column()), 
                                   data = df) %>% 
                                  summary))) %>% 
  unclass %>% 
  map_dfr(~ tidy(.x[[1]]))

# A tibble: 12 x 5
   term        estimate std.error statistic p.value
   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
 1 (Intercept)  0.00637    0.0281     0.227 0.832  
 2 MR           1.71       0.578      2.96  0.0414 
 3 (Intercept)  0.0123     0.0208     0.593 0.585  
 4 MR           0.0640     0.428      0.150 0.888  
 5 (Intercept) -0.0420     0.0143    -2.93  0.0427 
 6 MR           1.39       0.294      4.71  0.00925
 7 (Intercept)  0.00773    0.0168     0.460 0.669  
 8 MR           0.384      0.346      1.11  0.329  
 9 (Intercept) -0.00292    0.0134    -0.218 0.838  
10 MR           0.204      0.275      0.741 0.500  
11 (Intercept)  0.00620    0.0206     0.302 0.778  
12 MR           1.43       0.423      3.39  0.0276 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM