[英]calculate difference between dates for different actions in R
I have a dataframe like this:我有一个像这样的 dataframe:
d=data.frame('Id'=c(101,101,101,101,103,103,103),
'Action'=c('hire','terminate','promoted','promoted','hire','promoted','terminate'),
'date'=c('02/22/2017','12/11/2020','11/11/2018','03/22/2019','02/23/2016','01/12/2018','03/21/2019'))
I want to create a new column time_spent
which calculates the day between hire date and terminate date, and the hire date and promoted date.我想创建一个新列time_spent
,它计算雇用日期和终止日期之间的天数,以及雇用日期和晋升日期。
df$date=mdy(df$date)
df %>%
mutate(date = as.Date(EFFDT)) %>%
group_by(EMPLID) %>%
summarise(time_spent = as.numeric(difftime(EFFDT[ACTION == 'TER'], EFFDT[ACTION == 'HIR'], units = 'days')))%>%
inner_join(df,by='EMPLID')
Above code calculates the time but it's between hire and terminate.上面的代码计算了时间,但它在租用和终止之间。 How can we do for hire and promoted?我们如何才能做到聘用和升职?
Also if we see that for Id
101 promotion happens twice we have two different dates.此外,如果我们看到Id
101 促销发生两次,我们就有两个不同的日期。 So if we apply the above code for hire and promoted it only calculates the days between hire and the first occurrence of promoted Action for Id
101, and not all promoted Action which happened for Id
101.因此,如果我们将上述代码应用于雇用和提升,它只计算雇用与Id
Id
发生的所有提升操作。
Maybe this might be helpful.也许这可能会有所帮助。 Instead of summarise
and inner_join
you can use mutate
and have the new column time_spent
be the time difference between that row's date
and date
when the person was hired.您可以使用mutate
代替summarise
和inner_join
并让新列time_spent
成为该行的date
与雇用该人员的date
之间的时间差。
library(tidyverse)
d %>%
mutate(date = as.Date(date, format = "%m/%d/%Y")) %>%
arrange(Id, date) %>%
group_by(Id) %>%
mutate(time_spent = difftime(date, date[Action == "hire"], units = "days"))
Output Output
Id Action date time_spent
<dbl> <chr> <date> <drtn>
1 101 hire 2017-02-22 0 days
2 101 promoted 2018-11-11 627 days
3 101 promoted 2019-03-22 758 days
4 101 terminate 2020-12-11 1388 days
5 103 hire 2016-02-23 0 days
6 103 promoted 2018-01-12 689 days
7 103 terminate 2019-03-21 1122 days
Edit : If you want to include NA
when no "hire" date is available, you can filter
to include Id
that has any
"hire" and then rejoin data again.编辑:如果您想在没有“雇用”日期时包含NA
,您可以filter
以包含具有any
“雇用”的Id
,然后再次重新加入数据。 Just make sure data frame has date
in correct format first.只需确保数据框首先具有正确格式的date
。
d$date <- as.Date(d$date, format = "%m/%d/%Y")
d %>%
arrange(Id, date) %>%
group_by(Id) %>%
filter(any(Action == "hire")) %>%
mutate(time_spent = difftime(date, date[Action == "hire"], units = "days")) %>%
right_join(d)
Data数据
d <- structure(list(Id = c(101, 101, 101, 101, 103, 103, 103), Action = c("hire",
"terminate", "promoted", "promoted", "hire", "promoted", "terminate"
), date = c("02/22/2017", "12/11/2020", "11/11/2018", "03/22/2019",
"02/23/2016", "01/12/2018", "03/21/2019")), class = "data.frame", row.names = c(NA,
-7L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.