簡體   English   中英

將多個回歸系數合並為一個 dataframe

[英]Combine many regression coefficients into one dataframe

我有多個回歸線。 我想將這些系數組合成一個 dataframe 以便於可視化。

但是,並非所有回歸都具有相同的系數,因此我無法使用 for 循環查找系數名稱。

這是一個具有相同樣本數據和所需 output 的示例。

df=structure(list(x1 = c(-0.689814979498939, -0.509885025360363, 
-0.20248689168896, -1.79535329549682, 1.60447678701814, -0.408696703105769, 
0.97243696942363, -0.688339413750959, -0.359380427396309, 1.11638856659614
), x2 = c(0.775426469430265, 0.367906637531888, 0.965721516497862, 
-0.601113535090469, -0.655567870650469, 1.45494263752806, 0.187276141272287, 
-0.659949502938592, -0.481763339717836, -0.581132345668067), 
    x3 = c(-0.17202393327554, 0.022376822081548, -1.05069599269781, 
    -0.631926480864125, 1.76178640615702, -1.60488439781703, 
    0.172936842119056, 0.750091896988, -1.60900096983098, 0.443223570706679
    ), x4 = c(-0.117822668731567, -0.645150368596604, -1.58642572549226, 
    0.3630617077837, -1.00866095836508, 0.696818785571135, 0.978471598076335, 
    -0.315392158997475, 1.37594860146428, 0.0574562910914235), 
    y = c(-1.07067139899979, -0.360297366336307, 0.0328023505398295, 
    1.07908579247402, 0.185603676169661, 0.384858869675533, 0.62179479088495, 
    1.44265090318836, 0.340526158232088, -1.20387054108186)), class = "data.frame", row.names = c(NA, 
-10L))

model1=lm(y~x1, data=df)
model2=lm(y~x2, data=df)
model3=lm(y~x2+x4, data=df)
model4=lm(y~x2+x3+x4, data=df)

coefs_x1=c(-0.2749230,NA,NA,NA)
coefs_x2=c(NA,-0.2795309,-0.2599686,-0.40977455)
coefs_x3=c(NA,NA,NA,-0.18740855)
coefs_x4=c(NA,NA,0.1568399,0.04981574)

output_df=data.frame(coefs_x1,coefs_x2,coefs_x3,coefs_x4)
> output_df
   coefs_x1   coefs_x2   coefs_x3   coefs_x4
1 -0.274923         NA         NA         NA
2        NA -0.2795309         NA         NA
3        NA -0.2599686         NA 0.15683990
4        NA -0.4097746 -0.1874086 0.04981574

你可以這樣做:

library(tidyverse)

forms <- list(x1 = y~ x1, x2 = y ~ x2, x3 = y ~ x2 + x4, x4 = y ~ x2 + x3 + x4)


map_df(forms, ~coef(lm(.x, data = df))) %>%
     select(-1) 

# A tibble: 4 x 4
      x1     x2      x4     x3
   <dbl>  <dbl>   <dbl>  <dbl>
1 -0.275 NA     NA      NA    
2 NA     -0.280 NA      NA    
3 NA     -0.260  0.157  NA    
4 NA     -0.410  0.0498 -0.187

另外的選擇:

map(forms, ~t(coef(lm(.x, data = df)))) %>%
  plyr::rbind.fill.matrix() %>%
  as.data.frame() %>%
  select(-1)

         x1         x2         x4         x3
1 -0.274923         NA         NA         NA
2        NA -0.2795309         NA         NA
3        NA -0.2599686 0.15683990         NA
4        NA -0.4097745 0.04981574 -0.1874086

有很多方法可以做到這一點,這是我通常會使用dplyr做的事情。

您可以直接調用每個系數。 它們位於名為“模型”的對象“內部”。 調用model1$coeffcients后,它將返回系數,包括截距。 由於您不想要攔截(至少您沒有在問題中提及它),我將使用帶有[-1]參數的baseR將其刪除,從而刪除第一列。

然后我將所有行與bind_rows()放在一起,並使用select() () 組織演示文稿。 function bind_rows()將合並每一行,並添加新列,同時添加NA以彌補缺失。 這解決了你的問題。

解決方案

library(dplyr)

bind_rows(model1$coefficients[-1],
          model2$coefficients[-1],
          model3$coefficients[-1],
          model4$coefficients[-1]) %>% 
  select(x1, x2, x3, x4)

Output

# A tibble: 4 x 4
      x1     x2     x3      x4
   <dbl>  <dbl>  <dbl>   <dbl>
1 -0.275 NA     NA     NA     
2 NA     -0.280 NA     NA     
3 NA     -0.260 NA      0.157 
4 NA     -0.410 -0.187  0.0498

僅供參考,output 與您的相同,但tibble s 通常將其四舍五入以進行表示,但在背景中,它具有所有小數位。

使用base R(創建列表只是為了更容易命名,最初的想法是使用rbind do.call

# assumes coefs will be named coefs_x+
coefs <- ls(pattern="coefs_x*")
as.data.frame(coefs, col.names=paste0("coefs_x",1:length(coefs )))

   coefs_x1   coefs_x2   coefs_x3   coefs_x4
1 -0.274923         NA         NA         NA
2        NA -0.2795309         NA         NA
3        NA -0.2599686         NA 0.15683990
4        NA -0.4097746 -0.1874086 0.04981574

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM