使用管道在列表中的单个列上应用 dplyr 函数

Question

I'm tring to filter something across a list of dataframes for a specific column.我正在尝试在特定列的数据框列表中过滤某些内容。 Typically across a single dataframe using dplyr I would use:通常跨单个 dataframe 使用 dplyr 我会使用：

#creating dataframe
df <- data.frame(a = 0:10, d = 10:20)

# filtering column a for rows greater than 7
df %>% filter(a > 7)

I've tried doing this across a list using the following:我尝试使用以下方法在列表中执行此操作：


# creating list
x <- list(data.frame(a = 0:10, b = 10:20), 
data.frame(c = 11:20, d = 21:30), 
data.frame(e = 15:25, f = 35:45))

# selecting the appropriate column and trying to filter
# this is not working
x[1][[1]][1] %>% lapply(. %>% {filter(. > 2)})

# however, if I use the min() function it works
x[1][[1]][1] %>% lapply(. %>% {min(.)})

I find the %>% syntax quite easy to understand and carry out.我发现%>%语法很容易理解和执行。 However, in this case, selecting a specific column and doing something quite simple like filtering is not working.但是，在这种情况下，选择特定列并执行一些非常简单的操作（例如过滤）是行不通的。 I'm guessing map could be equally useful.我猜 map 可能同样有用。 Any help is appreciated.任何帮助表示赞赏。

Answer 1

You can use filter_at to refer column by position.您可以使用filter_at来引用 position 的列。

library(dplyr)
purrr::map(x, ~.x %>% filter_at(1, any_vars(. > 7)))

In filter , you can subset the column and use it在filter中，您可以对列进行子集化并使用它

purrr::map(x, ~.x %>% filter(.[[1]] > 7))

In base R, that would be:在基础 R 中，这将是：

lapply(x, function(y) y[y[[1]] > 7, ])

Answer 2

It seems you are interested in checking the condition on the first column of each dataframe in your list.您似乎有兴趣检查列表中每个 dataframe 第一列的条件。 One solution using dplyr would be使用dplyr的一种解决方案是

lapply(x, function(df) {df %>% filter_at(1, ~. > 7)})

The 1 in filter_at indicates that I want to check the condition on the first column ( 1 is a positional index) of each dataframe in the list. filter_at中的1表示我要检查列表中每个 dataframe 的第一列（ 1是位置索引）的条件。

EDIT编辑

After the discussion in the comments, I propose the following solution经过评论中的讨论，我提出以下解决方案

lapply(x, function(df) {df %>% filter(a > 7) %>% select(a) %>% slice(1)})

Input data输入数据

x <- list(data.frame(a = 0:10, b = 10:20), 
      data.frame(a = 11:20, b = 21:30), 
      data.frame(a = 15:25, b = 35:45))

Output Output

[[1]]
  a
1 8

[[2]]
   a
1 11

[[3]]
   a
1 15

Answer 3

Using filter with across使用filter across

library(dplyr)
library(purrr)
map(x, ~ .x %>% 
           filter(across(names(.)[1], ~ .> 7)))

使用管道在列表中的单个列上应用 dplyr 函数

问题描述

3 个解决方案

解决方案1
4 2020-05-14 12:27:56

解决方案2
3 已采纳 2020-05-14 12:30:01

EDIT编辑

解决方案3
1 2020-05-14 22:03:08

使用管道在列表中的单个列上应用 dplyr 函数

问题描述

3 个解决方案

解决方案1 4 2020-05-14 12:27:56

解决方案2 3 已采纳 2020-05-14 12:30:01

EDIT编辑

解决方案3 1 2020-05-14 22:03:08

解决方案1
4 2020-05-14 12:27:56

解决方案2
3 已采纳 2020-05-14 12:30:01

解决方案3
1 2020-05-14 22:03:08