简体   繁体   中英

Filter dates on a grouped dataset using dplyr

Assume I have the following dataset:

library(dplyr)

name <- c("b", "a", "a", "b","b","a", "b", "c",  "c",  "c",  "c", "a")
class <- c(0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1)
date <- c("10-06-2018", "11-06-2018", "12-06-2018", "13-06-2018", "14-06-2018", "15-06-2018", "16-06-2018","17-06-2018", "18-06-2018", "19-06-2018", "20-06-2018", "21-06-2018")
dates <- as.Date(date, "%d/%m/%Y")
df <- data.frame(name, class, date)

df <- df %>%
  group_by(name) %>%
  arrange(date) %>%
  ungroup() %>%
  arrange(name)

I want to filter the dataset so that for every name group I have the minimum date of class 0 and the minimum date of class 1 that is coming after class 0. In this case I would have:

df.new <- df[c(2,3,5,6,9,11), ]

There might by a more neat solution but a workaround will be the following

#split into two dataframes
# find the min dates for class == 0
df0 <- df %>%
 filter(class == 0) %>%
 group_by(name) %>%
 summarise(dates0 = min(dates))

# find min date of class == 1 that is coming after class == 0
# and join the two dataframes
df1 <- df %>%
 filter(class == 1) %>%
 select(-class) %>%
 left_join(df0, by = 'name')

# keep only the relevant dates     
df1 <- df1 %>%
 mutate(dates1 = ifelse(dates > dates0, 1, 0)) %>%
 filter(dates1 != 0) %>%
 group_by(name) %>%
 summarise(dates = min(dates)) %>%
 mutate(class = 1)

# combine the two dataframes into one with the correct dates
df <- df0 %>%
 mutate(class = 0) %>%
 rename(dates = dates0) %>%
 bind_rows(df1) %>%
 group_by(name) %>%
 arrange(dates) %>%
 ungroup() %>%
 arrange(name)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM