简体   繁体   中英

How to do reverse filter using dplyr

I am trying to filter the nycflights13::flights data, but am struggling with how to do it exactly.

I want the filter to remove rows that have dep_time data and NA for arr_time . However, when I use the below code it filters to show the flights with dep_time and NA for arr_time, but I want that subset removed from the entire dataset.

#filter(flights,!is.na(dep_time), is.na(arr_time))

I found this worked using base functionality, but want to learn how to do it using dplyr filter if possible

#flights[-which(!is.na(flights$dep_time) & is.na(flights$arr_time)), ]

Thanks for your help.

Unlike base subsetting, dplyr's filter only allows you to specify what to keep, not what to drop. So you have to take the complement of your predicate. By DeMorgan's Law your condition should be as follows:

flights %>% filter(is.na(dep_time) | !is.na(arr_time))

You can test that by using your base-r condition:

all.equal(
  flights[-which(!is.na(flights$dep_time) & is.na(flights$arr_time)), ],
  flights[which(is.na(flights$dep_time) | !is.na(flights$arr_time)), ]
)
# [1] TRUE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM