简体   繁体   English

R 在列表列工作流中使用 dplyr::select()

[英]R using dplyr::select() in a list-column workflow

I have a list of large data frames and I want to subset each one, retaining only certain columns.我有一个大数据框列表,我想对每个数据框进行子集化,只保留某些列。 The names of the columns I want are contained in character vectors unique to each data frame.我想要的列的名称包含在每个数据框唯一的字符向量中。

One way of doing this is with a list-column workflow.一种方法是使用列表列工作流。 I would create a data frame with a data list-column holding the data frames, and a cols list-column holding the character vectors.我将创建一个数据框,其中包含一个包含数据框的data列表列和一个包含字符向量的cols列表列。

The real application of this will include a list of 24 large datasets, paired with a list of 24 unique character vecotrs.它的实际应用将包括一个包含 24 个大型数据集的列表,以及一个包含 24 个独特字符向量的列表。 Here is a minimal example of this data structure to illustrate the problem:这是此数据结构的一个最小示例来说明问题:

set.seed(2346)
df <- tibble(
  col1 = sample(c(0,1), replace=T, size=10),
  col2 = sample(c(0,1), replace=T, size=10),
  col3 = sample(c(0,1), replace=T, size=10),
  col4 = sample(c(0,1), replace=T, size=10)
)

cols <- c("col1", "col3")

df_list_col <- tibble(
  data = list(df), 
  cols = list(cols)
)

df_list_col has the list-column structure, but only in a single row. df_list_col具有列表列结构,但仅在一行中。

My attempted solution is to create a third list-column to hold the subsetted data frame.我尝试的解决方案是创建第三个列表列来保存子集数据框。 Thus:因此:

df_output <- df_list_col %>% 
  mutate(subset = select(.$data, !!.$cols))

But this returns an error:但这会返回一个错误:

#   Error: Problem with `mutate()` input `subset`.
# x `select()` doesn't handle lists.
# ℹ Input `subset` is `select(.$data, list(c("col1", "col3")))`.

I also tried using purrr::map to apply the function:我还尝试使用purrr::map来应用该函数:

df_output <- df_list_col %>% 
  mutate(subset = map(.$data, ~ select(.x, !!.$cols)))

But that returns a similar error.但这会返回类似的错误。 In both cases, select() is seeing the vector of column names as a list, not as vector.在这两种情况下, select()都将列名称的向量视为列表,而不是向量。 And I'm stumped on how to change this behavior.我很难过如何改变这种行为。

Thanks in advance for any help!在此先感谢您的帮助!

Both are list columns.两者都是list列。 We can extract by unlist ing or extracting with [[ in select我们可以通过unlist或者用[[select提取

dplyr::select(df_list_col$data[[1]], unlist(df_list_col$cols))

Or another option with !!!或另一种选择!!!

select(df_list_col$data[[1]], !!! df_list_col$cols)

Or using the tidyverse syntax或者使用tidyverse语法

library(dplyr)
library(purrr)
df_list_col %>% 
         mutate(subset = map2(data, cols, ~ .x %>% select(all_of(.y))))

-output -输出

# A tibble: 1 x 3
#  data              cols      subset           
#  <list>            <list>    <list>           
#1 <tibble [10 × 4]> <chr [2]> <tibble [10 × 2]>

Or with pmap或者用pmap

df_list_col %>%
     mutate(subset = pmap(cur_data(),  ~ select(..1, all_of(..2 ))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM