How to do reverse filter using dplyr

Question

I am trying to filter the nycflights13::flights data, but am struggling with how to do it exactly.

I want the filter to remove rows that have dep_time data and NA for arr_time . However, when I use the below code it filters to show the flights with dep_time and NA for arr_time, but I want that subset removed from the entire dataset.

#filter(flights,!is.na(dep_time), is.na(arr_time))

I found this worked using base functionality, but want to learn how to do it using dplyr filter if possible

#flights[-which(!is.na(flights$dep_time) & is.na(flights$arr_time)), ]

Thanks for your help.

Answer 1

Unlike base subsetting, dplyr's filter only allows you to specify what to keep, not what to drop. So you have to take the complement of your predicate. By DeMorgan's Law your condition should be as follows:

flights %>% filter(is.na(dep_time) | !is.na(arr_time))

You can test that by using your base-r condition:

all.equal(
  flights[-which(!is.na(flights$dep_time) & is.na(flights$arr_time)), ],
  flights[which(is.na(flights$dep_time) | !is.na(flights$arr_time)), ]
)
# [1] TRUE

How to do reverse filter using dplyr

Question

1 answers

solution1
4 ACCPTED 2017-03-19 23:59:44

How to do reverse filter using dplyr

Question

1 answers

solution1 4 ACCPTED 2017-03-19 23:59:44

solution1
4 ACCPTED 2017-03-19 23:59:44