如何使用 dplyr 管道一次性取消嵌套数据帧的多个列表列

Question

I have the following tibble, which has two nested columns:我有以下小标题，它有两个嵌套列：

library(tidyverse)
df <- structure(list(a = list(c("a", "b"), "c"), b = list(c("1", "2", 
"3"), "3"), c = c(11, 22)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L))

Which produces:产生：

# A tibble: 2 x 3
  a         b             c
  <list>    <list>    <dbl>
1 <chr [2]> <chr [3]>    11
2 <chr [1]> <chr [1]>    22

How can I unnest them at once producing one single tibble?我怎样才能一次将它们解开，产生一个小标题？

I tried this but fail:我试过这个但失败了：

> df %>% unnest(a, b)
Error: All nested columns must have the same number of elements.

Answer 1

There's probably a cleaner way to do it, but if you want the cartesian product for the columns you can unnest them in sequence, if nothing else:可能有一种更清洁的方法，但如果你想要列的笛卡尔积，你可以按顺序取消它们，如果没有别的：

> df %>% 
    unnest(a, .drop = FALSE) %>% 
    unnest(b, .drop = FALSE)

# # A tibble: 7 x 3
#       c a     b    
#   <dbl> <chr> <chr>
# 1    11 a     1    
# 2    11 a     2    
# 3    11 a     3    
# 4    11 b     1    
# 5    11 b     2    
# 6    11 b     3    
# 7    22 c     3

Answer 2

tl;dr tl;博士

Use unnest_cross() (and be careful if list-cols are missing data --> keep_empty = TRUE ):使用unnest_cross() （如果 list-cols 缺少数据，请小心 --> keep_empty = TRUE ）：

unnest_cross <- function(data, cols, ...) {
    .df_out <- data
    .cols <- tidyselect::eval_select(rlang::enquo(cols), data)
    purrr::walk(
        .cols,
        function(col) {
            .df_out <<- unnest(.df_out, {{ col }}, ...)
        }
    )
    .df_out
}

Background: Multiple list-columns with `unnest()`背景：带有`unnest()`的多个列表列

unnest has handled multiple columns since v0.3.0 (2015).自v0.3.0 (2015) 以来， unnest已经处理了多个列。 It currently uses the cols argument, which accepts typical tidyverse selection methods.它目前使用cols参数，该参数接受典型的 tidyverse 选择方法。

Note that it's specifically designed to reverse nest() ed data.frames and requires list columns to be "parallel entries ... of compatible sizes".请注意，它专门设计用于反转nest() ed data.frames 并要求列表列是“并行条目......大小兼容”。 This means:这表示：

It doesn't work with the OP's data.frame.它不适用于 OP 的 data.frame。

df <- structure(list(
    a = list(c("a", "b"), "c"),
    b = list(c("1", "2", "3"), "3"),
    c = c(11, 22)),
    class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))

tidyr::unnest(df, cols = tidyselect::everything())
#> Error in `fn()`:
#> ! In row 1, can't recycle input of size 2 to size 3.

It will not produce the same output as sequential list-column unnest() ing (eg a cartesian product).它不会产生与顺序列表列unnest()相同的输出（例如笛卡尔积）。

# "parallel"/"compatible" data.frame
df_parallel <- structure(list(
    a = list(c("a", "b", "c"), "c"),
    b = list(c("1", "2", "3"), "3"),
    c = c(11, 22)),
    class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))

tidyr::unnest(df_parallel, cols = tidyselect::everything())
#> # A tibble: 4 × 3
#>   a     b         c
#>   <chr> <chr> <dbl>
#> 1 a     1        11
#> 2 b     2        11
#> 3 c     3        11
#> 4 c     3        22

`unnest_cross()` Details `unnest_cross()`详细信息

unnest_cross() uses purrr::walk() to cycle through the specified columns and unnest() them, saving the result each time via superassignment (with <<- ). unnest_cross()使用purrr::walk()循环遍历指定的列并unnest()它们，每次通过超级赋值（使用<<- ）保存结果。 It's name is derived from similarity to purrr::cross() because it always produces a cartesian product of list columns in a data.frame, even when they are "parallel entries" and/or "of compatible sizes"它的名字来源于与purrr::cross()的相似性，因为它总是产生 data.frame 中列表列的笛卡尔积，即使它们是“并行条目”和/或“大小兼容”

It works as desired for the original data.frame (with list-columns of unequal length):它适用于原始 data.frame （具有不等长度的列表列）：

# For original data.frame
unnest_cross(df, cols = tidyselect::everything())
#> # A tibble: 7 × 3
#>   a     b         c
#>   <chr> <chr> <dbl>
#> 1 a     1        11
#> 2 a     2        11
#> 3 a     3        11
#> 4 b     1        11
#> 5 b     2        11
#> 6 b     3        11
#> 7 c     3        22

It creates the cartesian product of df_parallel , which is very different from unnest() .它创建df_parallel的笛卡尔积，这与unnest()非常不同。

# For df with list-columns of "compatible size"
unnest_cross(df_parallel, cols = tidyselect::everything())
#> # A tibble: 10 × 3
#>    a     b         c
#>    <chr> <chr> <dbl>
#>  1 a     1        11
#>  2 a     2        11
#>  3 a     3        11
#>  4 b     1        11
#>  5 b     2        11
#>  6 b     3        11
#>  7 c     1        11
#>  8 c     2        11
#>  9 c     3        11
#> 10 c     3        22

^{Created on 2022-06-03 by the reprex package (v2.0.1)}^{由reprex 包于 2022-06-03 创建 (v2.0.1)}

如何使用 dplyr 管道一次性取消嵌套数据帧的多个列表列

问题描述

2 个解决方案

解决方案1
8 已采纳 2019-05-30 05:40:47

解决方案2
0 2022-06-03 18:54:34

tl;dr tl;博士

Background: Multiple list-columns with `unnest()`背景：带有`unnest()`的多个列表列

`unnest_cross()` Details `unnest_cross()`详细信息

如何使用 dplyr 管道一次性取消嵌套数据帧的多个列表列

问题描述

2 个解决方案

解决方案1 8 已采纳 2019-05-30 05:40:47

解决方案2 0 2022-06-03 18:54:34

tl;dr tl;博士

Background: Multiple list-columns with unnest()背景：带有unnest()的多个列表列

unnest_cross() Details unnest_cross()详细信息

解决方案1
8 已采纳 2019-05-30 05:40:47

解决方案2
0 2022-06-03 18:54:34

Background: Multiple list-columns with `unnest()`背景：带有`unnest()`的多个列表列

`unnest_cross()` Details `unnest_cross()`详细信息