[英]Applying a function iteratively in a grouped dplyr dataframe to create a column in R
Suppose I'm given the following input dataframe:假设我得到以下输入数据框:
ID Date
1 20th May, 2020
1 21st May, 2020
1 28th May, 2020
1 29th May, 2020
2 20th May, 2020
2 1st June, 2020
I want to generate the following dataframe:我想生成以下数据框:
ID Date Delta
1 20th May, 2020 0
1 21st May, 2020 1
1 28th May, 2020 7
1 29th May, 2020 1
2 20th May, 2020 0
2 1st June, 2020 12
Where the idea is, first I group by id
.这个想法在哪里,首先我按
id
分组。 Then within my current id
.然后在我当前的
id
。 I iterate over the days and subtract the current date with the previous date with the exception of the first date which is just itself.我迭代这些天并将当前日期与前一个日期相减,第一个日期除外,它只是它本身。
I have been using dplyr but I am uncertain on how to achieve this for groups and how to do this iteratively我一直在使用 dplyr,但我不确定如何为团体实现这一目标以及如何迭代地做到这一点
My goal is to filter the deltas and retain 0 and anything larger than 7 but it must follow the 'preceeding date' logic within a specific id
.我的目标是过滤增量并保留 0 和任何大于 7 的值,但它必须遵循特定
id
的“前一天”逻辑。
library(dplyr)
dat %>%
mutate(Date = as.Date(gsub("[a-z]{2} ", " ", Date), format = "%d %b, %Y")) %>%
group_by(ID) %>%
mutate(Delta = c(0, diff(Date))) %>%
ungroup()
# # A tibble: 6 x 3
# ID Date Delta
# <dbl> <date> <dbl>
# 1 1 2020-05-20 0
# 2 1 2020-05-21 1
# 3 1 2020-05-28 7
# 4 1 2020-05-29 1
# 5 2 2020-05-20 0
# 6 2 2020-06-01 12
Steps:脚步:
Date
-class objects, thenDate
类对象,然后diff
them within ID
groups. diff
他们内ID
组。 Data数据
dat <- structure(list(ID = c(1, 1, 1, 1, 2, 2), Date = c(" 20th May, 2020", " 21st May, 2020", " 28th May, 2020", " 29th May, 2020", " 20th May, 2020", " 1st June, 2020")), class = "data.frame", row.names = c(NA, -6L))
Similar logic as @r2evans but with different functions.与@r2evans 类似的逻辑,但具有不同的功能。
library(dplyr)
library(lubridate)
df %>%
mutate(Date = dmy(Date)) %>%
group_by(ID) %>%
mutate(Delta = as.integer(Date - lag(Date, default = first(Date)))) %>%
ungroup
# ID Date Delta
# <int> <date> <int>
#1 1 2020-05-20 0
#2 1 2020-05-21 1
#3 1 2020-05-28 7
#4 1 2020-05-29 1
#5 2 2020-05-20 0
#6 2 2020-06-01 12
data数据
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L), Date = c("20th May, 2020",
"21st May, 2020", "28th May, 2020", "29th May, 2020", "20th May, 2020",
"1st June, 2020")), class = "data.frame", row.names = c(NA, -6L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.