[英]R using dplyr::select() in a list-column workflow
I have a list of large data frames and I want to subset each one, retaining only certain columns.我有一个大数据框列表,我想对每个数据框进行子集化,只保留某些列。 The names of the columns I want are contained in character vectors unique to each data frame.
我想要的列的名称包含在每个数据框唯一的字符向量中。
One way of doing this is with a list-column workflow.一种方法是使用列表列工作流。 I would create a data frame with a
data
list-column holding the data frames, and a cols
list-column holding the character vectors.我将创建一个数据框,其中包含一个包含数据框的
data
列表列和一个包含字符向量的cols
列表列。
The real application of this will include a list of 24 large datasets, paired with a list of 24 unique character vecotrs.它的实际应用将包括一个包含 24 个大型数据集的列表,以及一个包含 24 个独特字符向量的列表。 Here is a minimal example of this data structure to illustrate the problem:
这是此数据结构的一个最小示例来说明问题:
set.seed(2346)
df <- tibble(
col1 = sample(c(0,1), replace=T, size=10),
col2 = sample(c(0,1), replace=T, size=10),
col3 = sample(c(0,1), replace=T, size=10),
col4 = sample(c(0,1), replace=T, size=10)
)
cols <- c("col1", "col3")
df_list_col <- tibble(
data = list(df),
cols = list(cols)
)
df_list_col
has the list-column structure, but only in a single row. df_list_col
具有列表列结构,但仅在一行中。
My attempted solution is to create a third list-column to hold the subsetted data frame.我尝试的解决方案是创建第三个列表列来保存子集数据框。 Thus:
因此:
df_output <- df_list_col %>%
mutate(subset = select(.$data, !!.$cols))
But this returns an error:但这会返回一个错误:
# Error: Problem with `mutate()` input `subset`.
# x `select()` doesn't handle lists.
# ℹ Input `subset` is `select(.$data, list(c("col1", "col3")))`.
I also tried using purrr::map
to apply the function:我还尝试使用
purrr::map
来应用该函数:
df_output <- df_list_col %>%
mutate(subset = map(.$data, ~ select(.x, !!.$cols)))
But that returns a similar error.但这会返回类似的错误。 In both cases,
select()
is seeing the vector of column names as a list, not as vector.在这两种情况下,
select()
都将列名称的向量视为列表,而不是向量。 And I'm stumped on how to change this behavior.我很难过如何改变这种行为。
Thanks in advance for any help!在此先感谢您的帮助!
Both are list
columns.两者都是
list
列。 We can extract by unlist
ing or extracting with [[
in select
我们可以通过
unlist
或者用[[
在select
提取
dplyr::select(df_list_col$data[[1]], unlist(df_list_col$cols))
Or another option with !!!
或另一种选择
!!!
select(df_list_col$data[[1]], !!! df_list_col$cols)
Or using the tidyverse
syntax或者使用
tidyverse
语法
library(dplyr)
library(purrr)
df_list_col %>%
mutate(subset = map2(data, cols, ~ .x %>% select(all_of(.y))))
-output -输出
# A tibble: 1 x 3
# data cols subset
# <list> <list> <list>
#1 <tibble [10 × 4]> <chr [2]> <tibble [10 × 2]>
Or with pmap
或者用
pmap
df_list_col %>%
mutate(subset = pmap(cur_data(), ~ select(..1, all_of(..2 ))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.