简体   繁体   English

purrr::map_dfr 按列绑定,而不是按预期行绑定

[英]purrr::map_dfr binds by columns, not row as expected

I'm new to tidyverse and thus still struggling a bit to make it do stuff I knew how to do with base.我是 tidyverse 的新手,因此仍然在努力让它做我知道如何用 base 做的事情。

The issue: I want to loop through the columns of a data frame, input each of them separately into a lm call, and get the output as a tidy data frame.问题:我想遍历数据帧的列,将每个列分别输入到 lm 调用中,然后将输出作为整洁的数据帧获取。 I don't care for the intercept, so all I want to save into the tidy output are the coefficients from the independent variable.我不关心截距,所以我想保存到整洁输出中的是自变量的系数。 I want the final output to look as follows: a data frame where the columns are the coefficients and the rows are each variable from the original data frame.我希望最终输出如下所示:一个数据框,其中列是系数,行是原始数据框中的每个变量。 I can do it with base using do.call("rbind", ...) but as I'm migrating to tidyverse, I wanted to see if there's a way to do it on tidyverse.我可以使用 do.call("rbind", ...) 来完成它,但是当我迁移到 tidyverse 时,我想看看是否有办法在 tidyverse 上做到这一点。 purrr::map_dfr doesn't work on this case; purrr::map_dfr 在这种情况下不起作用; a known issue .一个已知问题

Some reproducible code:一些可重现的代码:

> library(tidyverse)
> 
> set.seed(62442)
> 
> iv <- rnorm(100)
> dvs <- as_tibble(replicate(5, iv + rnorm(100)), .name_repair = "universal")
New names:
* `` -> ...1
* `` -> ...2
* `` -> ...3
* `` -> ...4
* `` -> ...5
> 
> # This doesn't work
> dvs %>% map_dfr(~ summary(lm(.x ~ iv))$coefficients[2, ]) 
# A tibble: 4 x 5
      ...1     ...2     ...3     ...4     ...5
     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 8.78e- 1 1.09e+ 0 9.11e- 1 1.19e+ 0 8.80e- 1
2 1.05e- 1 1.17e- 1 9.86e- 2 9.33e- 2 1.16e- 1
3 8.34e+ 0 9.29e+ 0 9.24e+ 0 1.27e+ 1 7.60e+ 0
4 4.78e-13 4.16e-15 5.40e-15 1.97e-22 1.80e-11
> 
> # It behaves exactly like:
> dvs %>% map_dfc(~ summary(lm(.x ~ iv))$coefficients[2, ])
# A tibble: 4 x 5
      ...1     ...2     ...3     ...4     ...5
     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 8.78e- 1 1.09e+ 0 9.11e- 1 1.19e+ 0 8.80e- 1
2 1.05e- 1 1.17e- 1 9.86e- 2 9.33e- 2 1.16e- 1
3 8.34e+ 0 9.29e+ 0 9.24e+ 0 1.27e+ 1 7.60e+ 0
4 4.78e-13 4.16e-15 5.40e-15 1.97e-22 1.80e-11
> 
> # All is left for me to do is:
> res <- dvs %>% map(~ summary(lm(.x ~ iv))$coefficients[2, ])
> do.call("rbind", res)
      Estimate Std. Error   t value                       Pr(>|t|)
...1 0.8776895 0.10525549  8.338658 0.0000000000004779501411861117
...2 1.0911362 0.11742588  9.292127 0.0000000000000041631074216992
...3 0.9113473 0.09863111  9.239958 0.0000000000000054021858298938
...4 1.1852848 0.09330950 12.702724 0.0000000000000000000001970469
...5 0.8799633 0.11579113  7.599575 0.0000000000179548788283525966

map row bind works when the datasets are data.frame/tibble or list s.当数据集是data.frame/tibblelist时, map行绑定有效。 Here, it is a named vector.在这里,它是一个命名向量。 One option is to convert it to list with as.list一种选择是使用as.list将其转换为list

library(dplyr)
library(purrr)
dvs %>% 
    map_dfr(~ summary(lm(.x ~ iv))$coefficients[2, ] %>% as.list)
# A tibble: 5 x 4
#  Estimate `Std. Error` `t value` `Pr(>|t|)`
#*    <dbl>        <dbl>     <dbl>      <dbl>
#1    0.878       0.105       8.34   4.78e-13
#2    1.09        0.117       9.29   4.16e-15
#3    0.911       0.0986      9.24   5.40e-15
#4    1.19        0.0933     12.7    1.97e-22
#5    0.880       0.116       7.60   1.80e-11

With the addition of broom , you can try:添加了broom ,您可以尝试:

map_dfr(.x = dvs, ~ tidy(lm(.x ~ iv)), .id = "ID")

   ID    term          estimate std.error statistic   p.value
   <chr> <chr>            <dbl>     <dbl>     <dbl>     <dbl>
 1 ...1  (Intercept) -0.260        0.0999 -2.61      1.05e- 2
 2 ...1  iv           0.878        0.105   8.34      4.78e-13
 3 ...2  (Intercept) -0.0000159    0.111  -0.000142 10.00e- 1
 4 ...2  iv           1.09         0.117   9.29      4.16e-15
 5 ...3  (Intercept) -0.0383       0.0936 -0.410     6.83e- 1
 6 ...3  iv           0.911        0.0986  9.24      5.40e-15
 7 ...4  (Intercept) -0.131        0.0885 -1.48      1.41e- 1
 8 ...4  iv           1.19         0.0933 12.7       1.97e-22
 9 ...5  (Intercept) -0.0132       0.110  -0.120     9.05e- 1
10 ...5  iv           0.880        0.116   7.60      1.80e-11

And if you don't need the intercept, with the addition of dplyr :如果你不需要拦截,加上dplyr

map_dfr(.x = dvs, ~ tidy(lm(.x ~ iv)), .id = "ID") %>%
 filter(term != "(Intercept)")

  ID    term  estimate std.error statistic  p.value
  <chr> <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1 ...1  iv       0.878    0.105       8.34 4.78e-13
2 ...2  iv       1.09     0.117       9.29 4.16e-15
3 ...3  iv       0.911    0.0986      9.24 5.40e-15
4 ...4  iv       1.19     0.0933     12.7  1.97e-22
5 ...5  iv       0.880    0.116       7.60 1.80e-11

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM