简体   繁体   English

purrr rbind每组数据帧列表

[英]purrr rbind list of data frames per group

After using purrr and friends to read in a load of csvs I have ended up with a tibble that looks something like this: 使用后purrr和朋友我已经结束了一个CSV中的负载读tibble看起来是这样的:

library(tidyverse)

df <- 
  tibble(
    df_name = c("A", "B", "A", "A", "B"),
    data = list(iris)
  )

df

# A tibble: 5 x 2
  df_name data                  
  <chr>   <list>                
1 A       <data.frame [150 × 5]>
2 B       <data.frame [150 × 5]>
3 A       <data.frame [150 × 5]>
4 A       <data.frame [150 × 5]>
5 B       <data.frame [150 × 5]>

I want to rbind (or equivalent) all data with a common df_name . 我想rbind (或同等学历)的所有数据与普通df_name I'd like the output to be a named list. 我希望输出是一个命名列表。 I can do this with tapply : 我可以用tapply做到这一点:

desired = tapply(df$data, df$df_name, function(y) do.call(rbind,y))   

List of 2
 $ A:'data.frame':  450 obs. of  5 variables:
  ..$ Sepal.Length: num [1:450] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
  ..$ Sepal.Width : num [1:450] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  ..$ Petal.Length: num [1:450] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
  ..$ Petal.Width : num [1:450] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
  ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ B:'data.frame':  300 obs. of  5 variables:
  ..$ Sepal.Length: num [1:300] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
  ..$ Sepal.Width : num [1:300] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  ..$ Petal.Length: num [1:300] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
  ..$ Petal.Width : num [1:300] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
  ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "dim")= int 2
 - attr(*, "dimnames")=List of 1
  ..$ : chr [1:2] "A" "B"

I can't figure out how to do the same with purrr verbs. 我不知道如何用purrr动词做同样的purrr I think perhaps I need to start by setting the list names: 我认为也许我需要先设置列表名称:

df_p <- 
  df %>%
  mutate(data = setNames(data, df_name))

I found this question but I can't figure out how to apply in this situation. 我找到了这个问题,但是我不知道如何在这种情况下应用。

We can use tidyr::unnest 我们可以使用tidyr::unnest

library(tidyverse)
df %>% split(.$df_name) %>% map(.%>%unnest() %>% select(-df_name))

#OR
df %>% split(.$df_name) %>% map(~unnest(.) %>% select(-df_name))
df %>% unnest(data) %>% split(.$df_name) 

As @kath pointed out that we can use unnest directly 正如@kath指出的,我们可以直接使用unnest

df %>% split(.$df_name) %>% map(unnest) 

You can use reduce from purrr and bind_rows (similar to rbind ) from dplyr . 您可以使用reducepurrrbind_rows (类似于rbind从) dplyr

df_list <- df %>% 
  group_by(df_name) %>% 
  summarize(data = list(reduce(data, bind_rows)))

df_list 
# A tibble: 2 x 2
#   df_name data                  
#   <chr>   <list>                
# 1 A       <data.frame [450 x 5]>
# 2 B       <data.frame [300 x 5]>

For the exact same structure as in your tapply -version we would need to add the following: 对于与tapply -version完全相同的结构,我们需要添加以下内容:

df_list2 <- df_list %>% 
  split(.$df_name) %>% 
  map(~ .x$data[[1]])

str(df_list2)
List of 2
 $ A:'data.frame':  450 obs. of  5 variables:
  ..$ Sepal.Length: num [1:450] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
  ..$ Sepal.Width : num [1:450] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  ..$ Petal.Length: num [1:450] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
  ..$ Petal.Width : num [1:450] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
  ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ B:'data.frame':  300 obs. of  5 variables:
  ..$ Sepal.Length: num [1:300] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
  ..$ Sepal.Width : num [1:300] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
  ..$ Petal.Length: num [1:300] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
  ..$ Petal.Width : num [1:300] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
  ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

I would use unnest and group_split : 我会使用unnestgroup_split

df %>% unnest(data) %>% group_split(df_name)

# [[1]]
# # A tibble: 450 x 6
#   df_name Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#   <chr>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
# 1 A                5.1         3.5          1.4         0.2 setosa 
# 2 A                4.9         3            1.4         0.2 setosa 
# 3 A                4.7         3.2          1.3         0.2 setosa 
# 4 A                4.6         3.1          1.5         0.2 setosa 
# 5 A                5           3.6          1.4         0.2 setosa 
# 6 A                5.4         3.9          1.7         0.4 setosa 
# 7 A                4.6         3.4          1.4         0.3 setosa 
# 8 A                5           3.4          1.5         0.2 setosa 
# 9 A                4.4         2.9          1.4         0.2 setosa 
# 10 A                4.9         3.1          1.5         0.1 setosa 
# # ... with 440 more rows
# 
# [[2]]
# # A tibble: 300 x 6
#   df_name Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#   <chr>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
# 1 B                5.1         3.5          1.4         0.2 setosa 
# 2 B                4.9         3            1.4         0.2 setosa 
# 3 B                4.7         3.2          1.3         0.2 setosa 
# 4 B                4.6         3.1          1.5         0.2 setosa 
# 5 B                5           3.6          1.4         0.2 setosa 
# 6 B                5.4         3.9          1.7         0.4 setosa 
# 7 B                4.6         3.4          1.4         0.3 setosa 
# 8 B                5           3.4          1.5         0.2 setosa 
# 9 B                4.4         2.9          1.4         0.2 setosa 
# 10 B                4.9         3.1          1.5         0.1 setosa 
# # ... with 290 more rows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM