简体   繁体   中英

Filter rows dataframe based on condition in different dataframe using dplyr

I have two dataframes: df1 and df2 (see example below). df1 contains a numeric start and end value per character id. df2 contains multiple events per character id, including a numeric time value.

library(dplyr)

df1 <- data_frame(id = c("A", "B"),
                  start = c(2, 3),
                  end = c(5, 9))

df2 <- data_frame(id = c(rep("A", 4), rep("B", 4)),
                  time = c(1, 3, 5, 8, 2, 3, 4, 10),
                  keep = c(FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE))

I am trying to filter events in df2 using dplyr based on the time value being equal to or in between the start and end values per id in df1. So the filter is "df2$time >= df1$start & df2$time <= df1$end" which should be executed for each id separatly. I've added a logical column "keep" to show which rows I want to keep in df2.

How can I execute this filter for each id? Preferably using dplyr. This should be the endresult:

df2 %>%
  filter(keep == TRUE)

Any help is much appreciated!

What about something like this, with dplyr :

  df1 %>% 
  left_join(df2) %>%                       #joining to have one dataset
  filter(time <= end, time >= start) %>%   # filter, you can use <, > in case
  select(-c(2,3))                          # remove useless column if necessary

# A tibble: 4 x 3
  id     time keep 
  <chr> <dbl> <lgl>
1 A         3 TRUE 
2 A         5 TRUE 
3 B         3 TRUE 
4 B         4 TRUE 

Using the non-equi join capability of :

# load the package and convert the dataframes to data.tables
library(data.table)
setDT(df1)
setDT(df2)

# non-equi join
df2[df1
    , on = .(id, time >= start, time <= end)
    , .(id, time = x.time)]

which gives:

  id time 1: A 3 2: A 5 3: B 3 4: B 4 

What this does:

  • setDT() converts a data.frame to a data.table
  • You join df1 with df2 . by using on = .(id, time >= start, time <= end) you join on the exact match of id while at the same time time has to higher or equal to start from df1 and lower or equal to end from df1 .
  • When doing a non-equi join, will return the time -columns from df1 ( start and end ) as time and time for the cases when the join-conditions are met ( see also here ). To get an idea of what I mean, you can just do df2[df1, on = .(id, time >= start, time <= end)] .
  • By using .(id, time = x.time) you get the desired columns back. x.time refers to the time -column from the x-data.table which is df2 .

This also works

df2$start <- df1[match(df2$id, df1$id),"start"]
df2$end <- df1[match(df2$id, df1$id),"end"]
df2$keep <- df2$time>df2$start& df2$time<df2$end
result <- df2 %>%  filter(keep)
result

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM