简体   繁体   English

如何使用 dplyr 管道一次性取消嵌套数据帧的多个列表列

[英]How to unnest multiple list columns of a dataframe in one go with dplyr pipe

I have the following tibble, which has two nested columns:我有以下小标题,它有两个嵌套列:

library(tidyverse)
df <- structure(list(a = list(c("a", "b"), "c"), b = list(c("1", "2", 
"3"), "3"), c = c(11, 22)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L))

Which produces:产生:

# A tibble: 2 x 3
  a         b             c
  <list>    <list>    <dbl>
1 <chr [2]> <chr [3]>    11
2 <chr [1]> <chr [1]>    22

How can I unnest them at once producing one single tibble?我怎样才能一次将它们解开,产生一个小标题?

I tried this but fail:我试过这个但失败了:

> df %>% unnest(a, b)
Error: All nested columns must have the same number of elements.

There's probably a cleaner way to do it, but if you want the cartesian product for the columns you can unnest them in sequence, if nothing else:可能有一种更清洁的方法,但如果你想要列的笛卡尔积,你可以按顺序取消它们,如果没有别的:

> df %>% 
    unnest(a, .drop = FALSE) %>% 
    unnest(b, .drop = FALSE)

# # A tibble: 7 x 3
#       c a     b    
#   <dbl> <chr> <chr>
# 1    11 a     1    
# 2    11 a     2    
# 3    11 a     3    
# 4    11 b     1    
# 5    11 b     2    
# 6    11 b     3    
# 7    22 c     3

tl;dr tl;博士

Use unnest_cross() (and be careful if list-cols are missing data --> keep_empty = TRUE ):使用unnest_cross() (如果 list-cols 缺少数据,请小心 --> keep_empty = TRUE ):

unnest_cross <- function(data, cols, ...) {
    .df_out <- data
    .cols <- tidyselect::eval_select(rlang::enquo(cols), data)
    purrr::walk(
        .cols,
        function(col) {
            .df_out <<- unnest(.df_out, {{ col }}, ...)
        }
    )
    .df_out
}

Background: Multiple list-columns with unnest()背景:带有unnest()的多个列表列

unnest has handled multiple columns since v0.3.0 (2015).v0.3.0 (2015) 以来, unnest已经处理了多个列。 It currently uses the cols argument, which accepts typical tidyverse selection methods.它目前使用cols参数,该参数接受典型的 tidyverse 选择方法。

Note that it's specifically designed to reverse nest() ed data.frames and requires list columns to be "parallel entries ... of compatible sizes".请注意,它专门设计用于反转nest() ed data.frames 并要求列表列是“并行条目......大小兼容”。 This means:这表示:

  1. It doesn't work with the OP's data.frame.它不适用于 OP 的 data.frame。
df <- structure(list(
    a = list(c("a", "b"), "c"),
    b = list(c("1", "2", "3"), "3"),
    c = c(11, 22)),
    class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))

tidyr::unnest(df, cols = tidyselect::everything())
#> Error in `fn()`:
#> ! In row 1, can't recycle input of size 2 to size 3.
  1. It will not produce the same output as sequential list-column unnest() ing (eg a cartesian product).它不会产生与顺序列表列unnest()相同的输出(例如笛卡尔积)。
# "parallel"/"compatible" data.frame
df_parallel <- structure(list(
    a = list(c("a", "b", "c"), "c"),
    b = list(c("1", "2", "3"), "3"),
    c = c(11, 22)),
    class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))

tidyr::unnest(df_parallel, cols = tidyselect::everything())
#> # A tibble: 4 × 3
#>   a     b         c
#>   <chr> <chr> <dbl>
#> 1 a     1        11
#> 2 b     2        11
#> 3 c     3        11
#> 4 c     3        22

unnest_cross() Details unnest_cross()详细信息

unnest_cross() uses purrr::walk() to cycle through the specified columns and unnest() them, saving the result each time via superassignment (with <<- ). unnest_cross()使用purrr::walk()循环遍历指定的列并unnest()它们,每次通过超级赋值(使用<<- )保存结果。 It's name is derived from similarity to purrr::cross() because it always produces a cartesian product of list columns in a data.frame, even when they are "parallel entries" and/or "of compatible sizes"它的名字来源于与purrr::cross()的相似性,因为它总是产生 data.frame 中列表列的笛卡尔积,即使它们是“并行条目”和/或“大小兼容”

  1. It works as desired for the original data.frame (with list-columns of unequal length):它适用于原始 data.frame (具有不等长度的列表列):
# For original data.frame
unnest_cross(df, cols = tidyselect::everything())
#> # A tibble: 7 × 3
#>   a     b         c
#>   <chr> <chr> <dbl>
#> 1 a     1        11
#> 2 a     2        11
#> 3 a     3        11
#> 4 b     1        11
#> 5 b     2        11
#> 6 b     3        11
#> 7 c     3        22
  1. It creates the cartesian product of df_parallel , which is very different from unnest() .它创建df_parallel的笛卡尔积,这与unnest()非常不同。
# For df with list-columns of "compatible size"
unnest_cross(df_parallel, cols = tidyselect::everything())
#> # A tibble: 10 × 3
#>    a     b         c
#>    <chr> <chr> <dbl>
#>  1 a     1        11
#>  2 a     2        11
#>  3 a     3        11
#>  4 b     1        11
#>  5 b     2        11
#>  6 b     3        11
#>  7 c     1        11
#>  8 c     2        11
#>  9 c     3        11
#> 10 c     3        22

Created on 2022-06-03 by the reprex package (v2.0.1)reprex 包于 2022-06-03 创建 (v2.0.1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM