How to subset data in R without losing NA rows?
The post above subsets using logical indexing. Is there a way to do it in dplyr?
Also, when does dplyr automatically delete NAs? In my experience, it removes NA when I filter out a specific string, eg:
b = a %>% filter(col != "str")
I would think this would not exclude NA
values but it does. But when I use other format of filtering, it does not automatically exclude NA
, eg:
b = a %>% filter(!grepl("str", col))
I would like to understand this feature of filter. I would appreciate any help. Thank you!
The documentation for dplyr::filter
says... "Unlike base subsetting, rows where the condition evaluates to NA are dropped."
NA != "str"
evaluates to NA
so is dropped by filter
.
!grepl("str", NA)
returns TRUE
, so is kept.
If you want filter
to keep NA
, you could do filter(is.na(col)|col!="str")
If you want to keep NAs created by the filter condition you can simply turn the condition NAs into TRUEs using replace_na
from tidyr .
a <- data.frame(col = c("hello", NA, "str"))
a %>% filter((col != "str") %>% replace_na(TRUE))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.