[英]How to unnest multiple list columns of a dataframe in one go with dplyr pipe
I have the following tibble, which has two nested columns:我有以下小标题,它有两个嵌套列:
library(tidyverse)
df <- structure(list(a = list(c("a", "b"), "c"), b = list(c("1", "2",
"3"), "3"), c = c(11, 22)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L))
Which produces:产生:
# A tibble: 2 x 3
a b c
<list> <list> <dbl>
1 <chr [2]> <chr [3]> 11
2 <chr [1]> <chr [1]> 22
How can I unnest them at once producing one single tibble?我怎样才能一次将它们解开,产生一个小标题?
I tried this but fail:我试过这个但失败了:
> df %>% unnest(a, b)
Error: All nested columns must have the same number of elements.
There's probably a cleaner way to do it, but if you want the cartesian product for the columns you can unnest them in sequence, if nothing else:可能有一种更清洁的方法,但如果你想要列的笛卡尔积,你可以按顺序取消它们,如果没有别的:
> df %>%
unnest(a, .drop = FALSE) %>%
unnest(b, .drop = FALSE)
# # A tibble: 7 x 3
# c a b
# <dbl> <chr> <chr>
# 1 11 a 1
# 2 11 a 2
# 3 11 a 3
# 4 11 b 1
# 5 11 b 2
# 6 11 b 3
# 7 22 c 3
Use unnest_cross()
(and be careful if list-cols are missing data --> keep_empty = TRUE
):使用
unnest_cross()
(如果 list-cols 缺少数据,请小心 --> keep_empty = TRUE
):
unnest_cross <- function(data, cols, ...) {
.df_out <- data
.cols <- tidyselect::eval_select(rlang::enquo(cols), data)
purrr::walk(
.cols,
function(col) {
.df_out <<- unnest(.df_out, {{ col }}, ...)
}
)
.df_out
}
unnest()
unnest()
的多个列表列unnest
has handled multiple columns since v0.3.0
(2015).自
v0.3.0
(2015) 以来, unnest
已经处理了多个列。 It currently uses the cols
argument, which accepts typical tidyverse selection methods.它目前使用
cols
参数,该参数接受典型的 tidyverse 选择方法。
Note that it's specifically designed to reverse nest()
ed data.frames and requires list columns to be "parallel entries ... of compatible sizes".请注意,它专门设计用于反转
nest()
ed data.frames 并要求列表列是“并行条目......大小兼容”。 This means:这表示:
df <- structure(list(
a = list(c("a", "b"), "c"),
b = list(c("1", "2", "3"), "3"),
c = c(11, 22)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))
tidyr::unnest(df, cols = tidyselect::everything())
#> Error in `fn()`:
#> ! In row 1, can't recycle input of size 2 to size 3.
unnest()
ing (eg a cartesian product).unnest()
相同的输出(例如笛卡尔积)。# "parallel"/"compatible" data.frame
df_parallel <- structure(list(
a = list(c("a", "b", "c"), "c"),
b = list(c("1", "2", "3"), "3"),
c = c(11, 22)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -2L))
tidyr::unnest(df_parallel, cols = tidyselect::everything())
#> # A tibble: 4 × 3
#> a b c
#> <chr> <chr> <dbl>
#> 1 a 1 11
#> 2 b 2 11
#> 3 c 3 11
#> 4 c 3 22
unnest_cross()
Details unnest_cross()
详细信息unnest_cross()
uses purrr::walk()
to cycle through the specified columns and unnest()
them, saving the result each time via superassignment (with <<-
). unnest_cross()
使用purrr::walk()
循环遍历指定的列并unnest()
它们,每次通过超级赋值(使用<<-
)保存结果。 It's name is derived from similarity to purrr::cross()
because it always produces a cartesian product of list columns in a data.frame, even when they are "parallel entries" and/or "of compatible sizes"它的名字来源于与
purrr::cross()
的相似性,因为它总是产生 data.frame 中列表列的笛卡尔积,即使它们是“并行条目”和/或“大小兼容”
# For original data.frame
unnest_cross(df, cols = tidyselect::everything())
#> # A tibble: 7 × 3
#> a b c
#> <chr> <chr> <dbl>
#> 1 a 1 11
#> 2 a 2 11
#> 3 a 3 11
#> 4 b 1 11
#> 5 b 2 11
#> 6 b 3 11
#> 7 c 3 22
df_parallel
, which is very different from unnest()
.df_parallel
的笛卡尔积,这与unnest()
非常不同。# For df with list-columns of "compatible size"
unnest_cross(df_parallel, cols = tidyselect::everything())
#> # A tibble: 10 × 3
#> a b c
#> <chr> <chr> <dbl>
#> 1 a 1 11
#> 2 a 2 11
#> 3 a 3 11
#> 4 b 1 11
#> 5 b 2 11
#> 6 b 3 11
#> 7 c 1 11
#> 8 c 2 11
#> 9 c 3 11
#> 10 c 3 22
Created on 2022-06-03 by the reprex package (v2.0.1)由reprex 包于 2022-06-03 创建 (v2.0.1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.