简体   繁体   English

dplyr:使用** select NA值**过滤多个条件

[英]dplyr: Filter multiple conditions with **selection NA values**

I am aware of the all the question regarding the filter multiple conditions with very comprehensive answers such as Q1 , Q2 , or even for removing NA values Q3 , Q4 . 我知道关于filter multiple conditions的所有问题,这些filter multiple conditions具有非常全面的答案,例如Q1Q2 ,甚至用于删除NA values Q3Q4

But I have a different question, How I can do filter using dplyr or even data.table functions to keep both NA values and a conditional parameters ? 但是我有一个不同的问题,如何使用dplyr甚至data.table函数进行filter以同时保留NA值和conditional parameters

as an example in the following I'd like to keep all of the values in Var3 which is >5 PLUS NA values . 作为下面的示例,我想将Var3所有值都Var3>5 PLUS NA values

library(data.table)
library(dplyr)

 Var1<- seq(1:5)
 Var2<- c("s", "a", "d", NA, NA)
 Var3<- c(NA, NA, 2, 5, 2) 
 Var4<- c(NA, 5, 1, 3,4)
 DT <- data.table(Var1,Var2,Var3, Var4) 
 DT
   Var1 Var2 Var3 Var4
1:    1    s   NA   NA
2:    2    a   NA    5
3:    3    d    2    1
4:    4   NA    5    3
5:    5   NA    2    4

The Expected results: 预期结果:

       Var1 Var2 Var3 Var4
    1:    1    s   NA   NA
    2:    2    a   NA    5
    3:    3    d    2    1
    4:    5   NA    2    4

I have tried followings but not successful: 我尝试了以下方法,但未成功:

##Using dplyr::filter
 DT %>%  filter(!Var3 ==5)
  Var1 Var2 Var3 Var4
1    3    d    2    1
2    5 <NA>    2    4

# or

DT %>%  filter(Var3 <5 & is.na(Var3))
[1] Var1 Var2 Var3 Var4
<0 rows> (or 0-length row.names)

## using data.table 

 DT[DT[,.I[Var3 <5], Var1]$V1]
   Var1 Var2 Var3 Var4
1:   NA   NA   NA   NA
2:   NA   NA   NA   NA
3:    3    d    2    1
4:    5   NA    2    4

Any help with explanation is highly appreciated! 如有任何解释帮助,我们将不胜感激!

I think this will work. 我认为这会起作用。 Use | 使用| to indicate or for the filters. 表示or用于过滤器。 dt2 is the expected output. dt2是预期的输出。

library(dplyr)

Var1 <- seq(1:5)
Var2 <- c("s", "a", "d", NA, NA)
Var3 <- c(NA, NA, 2, 5, 2) 
Var4 <- c(NA, 5, 1, 3, 4)

dt <- data_frame(Var1, Var2, Var3, Var4)

dt2 <- dt %>% filter(Var3 < 5 | is.na(Var3))

With data.table , we use the following logic to filter the rows where 'Var3' is less than 5 and not an NA ( !is.na(Var3) ) or ( | ) if it is an NA 对于data.table ,我们使用以下逻辑过滤'Var3'小于5的行,而不是NA( !is.na(Var3) )或( | )(如果它是NA)

DT[(Var3 < 5& !is.na(Var3)) | is.na(Var3)]
#   Var1 Var2 Var3 Var4
#1:    1    s   NA   NA
#2:    2    a   NA    5
#3:    3    d    2    1
#4:    5   NA    2    4

If we need the dplyr , just use the same logic in filter 如果我们需要dplyr ,只需在filter使用相同的逻辑

DT %>%
   filter((Var3 <5  & !is.na(Var3)) | is.na(Var3))

As @ycw mentioned the & !is.na(Var3) is not really needed but if we remove the is.na(Var3) , it becomes important 正如@ycw所提到的& !is.na(Var3)并不需要& !is.na(Var3) ,但是如果我们删除is.na(Var3) ,它就变得非常重要。

DT[, Var3 < 5 ]
#[1]    NA    NA  TRUE FALSE  TRUE

DT[, Var3 < 5  & !is.na(Var3)]
#[1] FALSE FALSE  TRUE FALSE  TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM