I am trying to filter the nycflights13::flights
data, but am struggling with how to do it exactly.
I want the filter to remove rows that have dep_time data and NA
for arr_time
. However, when I use the below code it filters to show the flights with dep_time
and NA
for arr_time, but I want that subset removed from the entire dataset.
#filter(flights,!is.na(dep_time), is.na(arr_time))
I found this worked using base functionality, but want to learn how to do it using dplyr filter if possible
#flights[-which(!is.na(flights$dep_time) & is.na(flights$arr_time)), ]
Thanks for your help.
Unlike base subsetting, dplyr's filter
only allows you to specify what to keep, not what to drop. So you have to take the complement of your predicate. By DeMorgan's Law your condition should be as follows:
flights %>% filter(is.na(dep_time) | !is.na(arr_time))
You can test that by using your base-r condition:
all.equal(
flights[-which(!is.na(flights$dep_time) & is.na(flights$arr_time)), ],
flights[which(is.na(flights$dep_time) | !is.na(flights$arr_time)), ]
)
# [1] TRUE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.