Is there a more efficient way of using dplyr filter to remove rows from a dataframe?

Question

I have a large dataframe from which I wish to remove some subjects (all subjects in procedure 2 where subject ID is "4")

Example (and cut down) dataset is here: http://pastebin.com/raw/Dz6xxgM3

My dplyr filter line is

library(dplyr)
df<-read.table("http://pastebin.com/raw/Dz6xxgM3")
  filter(df,
    proc == "1" | proc == "3" | proc== "4" | proc =="5"  | (proc=="2" & subject != "4") 
  )

This works but seems cludgy - I have to put a regex in to include all of the procedures as well as proc ==2.

Is there a more elegant/efficient way to delete the rows for subject 4 in procedure 2 ?

Cheers Pete

Answer 1

We can use %in% instead of == to check for multiple values in the 'proc' column.

 df %>% 
     filter(proc %in% c(1,3:5)|(proc==2 & subject !=4))

You could probably condense to a not expression like

filter(!(subject=='4' & proc=='2'))

as an alternative.

Is there a more efficient way of using dplyr filter to remove rows from a dataframe?

Question

1 answers

solution1
5 ACCPTED 2016-01-20 16:38:05

Is there a more efficient way of using dplyr filter to remove rows from a dataframe?

Question

1 answers

solution1 5 ACCPTED 2016-01-20 16:38:05

solution1
5 ACCPTED 2016-01-20 16:38:05