R 在列表列工作流中使用 dplyr::select()

Question

I have a list of large data frames and I want to subset each one, retaining only certain columns.我有一个大数据框列表，我想对每个数据框进行子集化，只保留某些列。 The names of the columns I want are contained in character vectors unique to each data frame.我想要的列的名称包含在每个数据框唯一的字符向量中。

One way of doing this is with a list-column workflow.一种方法是使用列表列工作流。 I would create a data frame with a data list-column holding the data frames, and a cols list-column holding the character vectors.我将创建一个数据框，其中包含一个包含数据框的data列表列和一个包含字符向量的cols列表列。

The real application of this will include a list of 24 large datasets, paired with a list of 24 unique character vecotrs.它的实际应用将包括一个包含 24 个大型数据集的列表，以及一个包含 24 个独特字符向量的列表。 Here is a minimal example of this data structure to illustrate the problem:这是此数据结构的一个最小示例来说明问题：

set.seed(2346)
df <- tibble(
  col1 = sample(c(0,1), replace=T, size=10),
  col2 = sample(c(0,1), replace=T, size=10),
  col3 = sample(c(0,1), replace=T, size=10),
  col4 = sample(c(0,1), replace=T, size=10)
)

cols <- c("col1", "col3")

df_list_col <- tibble(
  data = list(df), 
  cols = list(cols)
)

df_list_col has the list-column structure, but only in a single row. df_list_col具有列表列结构，但仅在一行中。

My attempted solution is to create a third list-column to hold the subsetted data frame.我尝试的解决方案是创建第三个列表列来保存子集数据框。 Thus:因此：

df_output <- df_list_col %>% 
  mutate(subset = select(.$data, !!.$cols))

But this returns an error:但这会返回一个错误：

#   Error: Problem with `mutate()` input `subset`.
# x `select()` doesn't handle lists.
# ℹ Input `subset` is `select(.$data, list(c("col1", "col3")))`.

I also tried using purrr::map to apply the function:我还尝试使用purrr::map来应用该函数：

df_output <- df_list_col %>% 
  mutate(subset = map(.$data, ~ select(.x, !!.$cols)))

But that returns a similar error.但这会返回类似的错误。 In both cases, select() is seeing the vector of column names as a list, not as vector.在这两种情况下， select()都将列名称的向量视为列表，而不是向量。 And I'm stumped on how to change this behavior.我很难过如何改变这种行为。

Thanks in advance for any help!在此先感谢您的帮助！

Answer 1

Both are list columns.两者都是list列。 We can extract by unlist ing or extracting with [[ in select我们可以通过unlist或者用[[在select提取

dplyr::select(df_list_col$data[[1]], unlist(df_list_col$cols))

Or another option with !!!或另一种选择!!!

select(df_list_col$data[[1]], !!! df_list_col$cols)

Or using the tidyverse syntax或者使用tidyverse语法

library(dplyr)
library(purrr)
df_list_col %>% 
         mutate(subset = map2(data, cols, ~ .x %>% select(all_of(.y))))

-output -输出

# A tibble: 1 x 3
#  data              cols      subset           
#  <list>            <list>    <list>           
#1 <tibble [10 × 4]> <chr [2]> <tibble [10 × 2]>

Or with pmap或者用pmap

df_list_col %>%
     mutate(subset = pmap(cur_data(),  ~ select(..1, all_of(..2 ))))

R 在列表列工作流中使用 dplyr::select()

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-10-23 23:05:33

R 在列表列工作流中使用 dplyr::select()

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-10-23 23:05:33

解决方案1
2 已采纳 2020-10-23 23:05:33