简体   繁体   中英

Applying a function iteratively in a grouped dplyr dataframe to create a column in R

Suppose I'm given the following input dataframe:

ID  Date
1   20th May, 2020
1   21st May, 2020
1   28th May, 2020
1   29th May, 2020
2   20th May, 2020
2   1st June, 2020

I want to generate the following dataframe:

ID  Date            Delta
1   20th May, 2020      0
1   21st May, 2020      1
1   28th May, 2020      7
1   29th May, 2020      1
2   20th May, 2020      0
2   1st June, 2020     12

Where the idea is, first I group by id . Then within my current id . I iterate over the days and subtract the current date with the previous date with the exception of the first date which is just itself.

I have been using dplyr but I am uncertain on how to achieve this for groups and how to do this iteratively

My goal is to filter the deltas and retain 0 and anything larger than 7 but it must follow the 'preceeding date' logic within a specific id .

library(dplyr)
dat %>%
  mutate(Date = as.Date(gsub("[a-z]{2} ", " ", Date), format = "%d %b, %Y")) %>%
  group_by(ID) %>%
  mutate(Delta = c(0, diff(Date))) %>%
  ungroup()
# # A tibble: 6 x 3
#      ID Date       Delta
#   <dbl> <date>     <dbl>
# 1     1 2020-05-20     0
# 2     1 2020-05-21     1
# 3     1 2020-05-28     7
# 4     1 2020-05-29     1
# 5     2 2020-05-20     0
# 6     2 2020-06-01    12

Steps:

  1. remove the ordinal from numbers, so that we can
  2. convert them to proper Date -class objects, then
  3. diff them within ID groups.

Data

dat <- structure(list(ID = c(1, 1, 1, 1, 2, 2), Date = c("  20th May, 2020", "  21st May, 2020", "  28th May, 2020", "  29th May, 2020", "  20th May, 2020", "  1st June, 2020")), class = "data.frame", row.names = c(NA, -6L))

Similar logic as @r2evans but with different functions.

library(dplyr)
library(lubridate)

df %>%
  mutate(Date = dmy(Date)) %>%
  group_by(ID) %>%
  mutate(Delta = as.integer(Date - lag(Date, default = first(Date)))) %>%
  ungroup

#     ID Date       Delta
#  <int> <date>     <int>
#1     1 2020-05-20     0
#2     1 2020-05-21     1
#3     1 2020-05-28     7
#4     1 2020-05-29     1
#5     2 2020-05-20     0
#6     2 2020-06-01    12

data

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L), Date = c("20th May, 2020", 
"21st May, 2020", "28th May, 2020", "29th May, 2020", "20th May, 2020", 
"1st June, 2020")), class = "data.frame", row.names = c(NA, -6L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM