简体   繁体   English

如何在 R dplyr 过滤器中实现条件过滤器

[英]How to implement a conditional filter in R dplyr filter

I have the following data with two columns and 15 rows:我有以下两列和 15 行的数据:

data_1 <- structure(list(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, 
NA, 925, NA, 181, 182, 188, NA), column_2 = c(7, NA, 1, 1, 1, 
3, 7, NA, 1, NA, 1, NA, 1, 1, 1)), row.names = c(NA, -15L), class = c("tbl_df", 
"tbl", "data.frame"))
column_1列_1 column_2 column_2
1 1 120 120 7 7
2 2 130 130 NA不适用
3 3 NA不适用 1 1
4 4 NA不适用 1 1
5 5 NA不适用 1 1
6 6 130 130 3 3
7 7 182 182 7 7
8 8 130 130 NA不适用
9 9 NA不适用 1 1
10 10 925 925 NA不适用
11 11 NA不适用 1 1
12 12 181 181 NA不适用
13 13 182 182 1 1
14 14 188 188 1 1
15 15 NA不适用 1 1
  • By using filters, I would like to keep the oberservations with the following values in column_1: NA , 130, 181, 182, 188通过使用过滤器,我想在 column_1 中保留以下值的观察结果: NA , 130, 181, 182, 188
  • Furthermore, I would like to remove all observations with the entry 7 in column_2此外,我想删除 column_2 中条目 7 的所有观察结果

So far, this works by the following code:到目前为止,这通过以下代码起作用:

data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7)

Now I want to add an additional filter: If the value is 130 in column_1 and in column_2 it is a NA , then remove the oberservation (so the rows 2 and 8 in data_1).现在我想添加一个额外的过滤器:如果 column_1 中的值为 130 并且 column_2 中的值为NA ,则删除观察值(因此 data_1 中的第 2 行和第 8 行)。 How could I do this?我怎么能这样做? What are the best ways to achieve this conditional filter?实现此条件过滤器的最佳方法是什么? I have tried the following commands so far, which do not lead to the desired result:到目前为止,我已经尝试了以下命令,但都没有达到预期的结果:

data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(case_when(column_1 == 130 ~ !is.na(column_2)))

The result here is that only the entry 130, 3 is kept.此处的结果是仅保留条目 130、3。

data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(case_when(column_1 == 130 ~ !is.na(column_2), TRUE ~ is.na(column_2)))

Now two entries remain: 130, 3 and 181, NA现在剩下两个条目:130、3 和 181, NA

I have also tried the following two commands:我还尝试了以下两个命令:

data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(if (column_2 == 130) !is.na(column_2))
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% {if (column_2 == 130) filter(., !is.na(column_2))}

Are you looking for something like this?你在寻找这样的东西吗?

library(tidyverse)


data_1 |>
  filter(case_when(
    is.na(column_1) ~ T,
    column_1 == 130 & is.na(column_2 ) ~ F,
    column_2 == 7 ~ F,
    column_1 %in% c(130, 181, 182, 188) ~ T,
    T ~ F
  ))
#> # A tibble: 10 x 2
#>    column_1 column_2
#>       <dbl>    <dbl>
#>  1       NA        1
#>  2       NA        1
#>  3       NA        1
#>  4      130        3
#>  5       NA        1
#>  6       NA        1
#>  7      181       NA
#>  8      182        1
#>  9      188        1
#> 10       NA        1

I just added all of your conditions to one big case_when .我刚刚将您的所有条件添加到一个大case_when中。 Make sure to map the statements to T and F so that the filter works correctly.确保将语句映射到TF ,以便过滤器正常工作。 In this case, when the condition is mapped to T you will keep the row and when it is F you will remove the row.在这种情况下,当条件映射到T时,您将保留该行,当它为F时,您将删除该行。

I would only add that structure(list()) may be needlessly high level here unless it is done for another reason.我只会补充说结构(列表())可能在这里不必要的高级,除非它是出于其他原因完成的。 Simpler would be:更简单的是:

data.frame(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, NA, 925, NA, 181, 182, 188, NA), 
           column_2 = c(7, NA, 1, 1, 1, 3, 7, NA, 1, NA, 1, NA, 1, 1, 1)))

# or

tibble::tibble(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, NA, 925, NA, 181, 182, 188, NA), 
               column_2 = c(7, NA, 1, 1, 1, 3, 7, NA, 1, NA, 1, NA, 1, 1, 1))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM