[英]purrr rbind list of data frames per group
After using purrr
and friends to read in a load of csvs I have ended up with a tibble
that looks something like this: 使用后purrr
和朋友我已经结束了一个CSV中的负载读tibble
看起来是这样的:
library(tidyverse)
df <-
tibble(
df_name = c("A", "B", "A", "A", "B"),
data = list(iris)
)
df
# A tibble: 5 x 2
df_name data
<chr> <list>
1 A <data.frame [150 × 5]>
2 B <data.frame [150 × 5]>
3 A <data.frame [150 × 5]>
4 A <data.frame [150 × 5]>
5 B <data.frame [150 × 5]>
I want to rbind
(or equivalent) all data with a common df_name
. 我想rbind
(或同等学历)的所有数据与普通df_name
。 I'd like the output to be a named list. 我希望输出是一个命名列表。 I can do this with tapply
: 我可以用tapply
做到这一点:
desired = tapply(df$data, df$df_name, function(y) do.call(rbind,y))
List of 2
$ A:'data.frame': 450 obs. of 5 variables:
..$ Sepal.Length: num [1:450] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:450] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:450] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:450] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
$ B:'data.frame': 300 obs. of 5 variables:
..$ Sepal.Length: num [1:300] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:300] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:300] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:300] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dim")= int 2
- attr(*, "dimnames")=List of 1
..$ : chr [1:2] "A" "B"
I can't figure out how to do the same with purrr
verbs. 我不知道如何用purrr
动词做同样的purrr
。 I think perhaps I need to start by setting the list names: 我认为也许我需要先设置列表名称:
df_p <-
df %>%
mutate(data = setNames(data, df_name))
I found this question but I can't figure out how to apply in this situation. 我找到了这个问题,但是我不知道如何在这种情况下应用。
We can use tidyr::unnest
我们可以使用tidyr::unnest
library(tidyverse)
df %>% split(.$df_name) %>% map(.%>%unnest() %>% select(-df_name))
#OR
df %>% split(.$df_name) %>% map(~unnest(.) %>% select(-df_name))
df %>% unnest(data) %>% split(.$df_name)
As @kath pointed out that we can use unnest
directly 正如@kath指出的,我们可以直接使用unnest
df %>% split(.$df_name) %>% map(unnest)
You can use reduce
from purrr
and bind_rows
(similar to rbind
) from dplyr
. 您可以使用reduce
从purrr
和bind_rows
(类似于rbind
从) dplyr
。
df_list <- df %>%
group_by(df_name) %>%
summarize(data = list(reduce(data, bind_rows)))
df_list
# A tibble: 2 x 2
# df_name data
# <chr> <list>
# 1 A <data.frame [450 x 5]>
# 2 B <data.frame [300 x 5]>
For the exact same structure as in your tapply
-version we would need to add the following: 对于与tapply
-version完全相同的结构,我们需要添加以下内容:
df_list2 <- df_list %>%
split(.$df_name) %>%
map(~ .x$data[[1]])
str(df_list2)
List of 2
$ A:'data.frame': 450 obs. of 5 variables:
..$ Sepal.Length: num [1:450] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:450] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:450] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:450] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
$ B:'data.frame': 300 obs. of 5 variables:
..$ Sepal.Length: num [1:300] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
..$ Sepal.Width : num [1:300] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
..$ Petal.Length: num [1:300] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
..$ Petal.Width : num [1:300] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
I would use unnest
and group_split
: 我会使用unnest
和group_split
:
df %>% unnest(data) %>% group_split(df_name)
# [[1]]
# # A tibble: 450 x 6
# df_name Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <chr> <dbl> <dbl> <dbl> <dbl> <fct>
# 1 A 5.1 3.5 1.4 0.2 setosa
# 2 A 4.9 3 1.4 0.2 setosa
# 3 A 4.7 3.2 1.3 0.2 setosa
# 4 A 4.6 3.1 1.5 0.2 setosa
# 5 A 5 3.6 1.4 0.2 setosa
# 6 A 5.4 3.9 1.7 0.4 setosa
# 7 A 4.6 3.4 1.4 0.3 setosa
# 8 A 5 3.4 1.5 0.2 setosa
# 9 A 4.4 2.9 1.4 0.2 setosa
# 10 A 4.9 3.1 1.5 0.1 setosa
# # ... with 440 more rows
#
# [[2]]
# # A tibble: 300 x 6
# df_name Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# <chr> <dbl> <dbl> <dbl> <dbl> <fct>
# 1 B 5.1 3.5 1.4 0.2 setosa
# 2 B 4.9 3 1.4 0.2 setosa
# 3 B 4.7 3.2 1.3 0.2 setosa
# 4 B 4.6 3.1 1.5 0.2 setosa
# 5 B 5 3.6 1.4 0.2 setosa
# 6 B 5.4 3.9 1.7 0.4 setosa
# 7 B 4.6 3.4 1.4 0.3 setosa
# 8 B 5 3.4 1.5 0.2 setosa
# 9 B 4.4 2.9 1.4 0.2 setosa
# 10 B 4.9 3.1 1.5 0.1 setosa
# # ... with 290 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.