简体   繁体   中英

How to filter data without losing NA rows using dplyr

How to subset data in R without losing NA rows?

The post above subsets using logical indexing. Is there a way to do it in dplyr?

Also, when does dplyr automatically delete NAs? In my experience, it removes NA when I filter out a specific string, eg:

b = a %>% filter(col != "str")

I would think this would not exclude NA values but it does. But when I use other format of filtering, it does not automatically exclude NA , eg:

b = a %>% filter(!grepl("str", col))

I would like to understand this feature of filter. I would appreciate any help. Thank you!

The documentation for dplyr::filter says... "Unlike base subsetting, rows where the condition evaluates to NA are dropped."

NA != "str" evaluates to NA so is dropped by filter .

!grepl("str", NA) returns TRUE , so is kept.

If you want filter to keep NA , you could do filter(is.na(col)|col!="str")

If you want to keep NAs created by the filter condition you can simply turn the condition NAs into TRUEs using replace_na from tidyr .

a <- data.frame(col = c("hello", NA, "str"))
a %>% filter((col != "str") %>% replace_na(TRUE))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM