[英]How to implement a conditional filter in R dplyr filter
I have the following data with two columns and 15 rows:我有以下两列和 15 行的数据:
data_1 <- structure(list(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130,
NA, 925, NA, 181, 182, 188, NA), column_2 = c(7, NA, 1, 1, 1,
3, 7, NA, 1, NA, 1, NA, 1, 1, 1)), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
column_1![]() |
column_2 ![]() |
|
---|---|---|
1 ![]() |
120 ![]() |
7 ![]() |
2 ![]() |
130 ![]() |
NA![]() |
3 ![]() |
NA![]() |
1 ![]() |
4 ![]() |
NA![]() |
1 ![]() |
5 ![]() |
NA![]() |
1 ![]() |
6 ![]() |
130 ![]() |
3 ![]() |
7 ![]() |
182 ![]() |
7 ![]() |
8 ![]() |
130 ![]() |
NA![]() |
9 ![]() |
NA![]() |
1 ![]() |
10 ![]() |
925 ![]() |
NA![]() |
11 ![]() |
NA![]() |
1 ![]() |
12 ![]() |
181 ![]() |
NA![]() |
13 ![]() |
182 ![]() |
1 ![]() |
14 ![]() |
188 ![]() |
1 ![]() |
15 ![]() |
NA![]() |
1 ![]() |
NA
, 130, 181, 182, 188NA
, 130, 181, 182, 188So far, this works by the following code:到目前为止,这通过以下代码起作用:
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7)
Now I want to add an additional filter: If the value is 130 in column_1 and in column_2 it is a NA
, then remove the oberservation (so the rows 2 and 8 in data_1).现在我想添加一个额外的过滤器:如果 column_1 中的值为 130 并且 column_2 中的值为
NA
,则删除观察值(因此 data_1 中的第 2 行和第 8 行)。 How could I do this?我怎么能这样做? What are the best ways to achieve this conditional filter?
实现此条件过滤器的最佳方法是什么? I have tried the following commands so far, which do not lead to the desired result:
到目前为止,我已经尝试了以下命令,但都没有达到预期的结果:
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(case_when(column_1 == 130 ~ !is.na(column_2)))
The result here is that only the entry 130, 3 is kept.此处的结果是仅保留条目 130、3。
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(case_when(column_1 == 130 ~ !is.na(column_2), TRUE ~ is.na(column_2)))
Now two entries remain: 130, 3 and 181, NA
现在剩下两个条目:130、3 和 181,
NA
I have also tried the following two commands:我还尝试了以下两个命令:
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% filter(if (column_2 == 130) !is.na(column_2))
data_1 %>% filter(is.na(column_1) | column_1 %in% c(130, 181, 182, 188), !column_2 %in% 7) %>% {if (column_2 == 130) filter(., !is.na(column_2))}
Are you looking for something like this?你在寻找这样的东西吗?
library(tidyverse)
data_1 |>
filter(case_when(
is.na(column_1) ~ T,
column_1 == 130 & is.na(column_2 ) ~ F,
column_2 == 7 ~ F,
column_1 %in% c(130, 181, 182, 188) ~ T,
T ~ F
))
#> # A tibble: 10 x 2
#> column_1 column_2
#> <dbl> <dbl>
#> 1 NA 1
#> 2 NA 1
#> 3 NA 1
#> 4 130 3
#> 5 NA 1
#> 6 NA 1
#> 7 181 NA
#> 8 182 1
#> 9 188 1
#> 10 NA 1
I just added all of your conditions to one big case_when
.我刚刚将您的所有条件添加到一个大
case_when
中。 Make sure to map the statements to T
and F
so that the filter works correctly.确保将语句映射到
T
和F
,以便过滤器正常工作。 In this case, when the condition is mapped to T
you will keep the row and when it is F
you will remove the row.在这种情况下,当条件映射到
T
时,您将保留该行,当它为F
时,您将删除该行。
I would only add that structure(list()) may be needlessly high level here unless it is done for another reason.我只会补充说结构(列表())可能在这里不必要的高级,除非它是出于其他原因完成的。 Simpler would be:
更简单的是:
data.frame(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, NA, 925, NA, 181, 182, 188, NA),
column_2 = c(7, NA, 1, 1, 1, 3, 7, NA, 1, NA, 1, NA, 1, 1, 1)))
# or
tibble::tibble(column_1 = c(120, 130, NA, NA, NA, 130, 182, 130, NA, 925, NA, 181, 182, 188, NA),
column_2 = c(7, NA, 1, 1, 1, 3, 7, NA, 1, NA, 1, NA, 1, 1, 1))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.