简体   繁体   English

计算 R 中不同操作的日期之间的差异

[英]calculate difference between dates for different actions in R

I have a dataframe like this:我有一个像这样的 dataframe:

d=data.frame('Id'=c(101,101,101,101,103,103,103),
             'Action'=c('hire','terminate','promoted','promoted','hire','promoted','terminate'),
             'date'=c('02/22/2017','12/11/2020','11/11/2018','03/22/2019','02/23/2016','01/12/2018','03/21/2019'))

I want to create a new column time_spent which calculates the day between hire date and terminate date, and the hire date and promoted date.我想创建一个新列time_spent ,它计算雇用日期和终止日期之间的天数,以及雇用日期和晋升日期。

df$date=mdy(df$date)

df %>%
    mutate(date = as.Date(EFFDT)) %>%
    group_by(EMPLID) %>%
    summarise(time_spent = as.numeric(difftime(EFFDT[ACTION == 'TER'], EFFDT[ACTION == 'HIR'], units = 'days')))%>%
    inner_join(df,by='EMPLID')

Above code calculates the time but it's between hire and terminate.上面的代码计算了时间,但它在租用和终止之间。 How can we do for hire and promoted?我们如何才能做到聘用和升职?

Also if we see that for Id 101 promotion happens twice we have two different dates.此外,如果我们看到Id 101 促销发生两次,我们就有两个不同的日期。 So if we apply the above code for hire and promoted it only calculates the days between hire and the first occurrence of promoted Action for Id 101, and not all promoted Action which happened for Id 101.因此,如果我们将上述代码应用于雇用和提升,它只计算雇用与Id Id发生的所有提升操作。

Maybe this might be helpful.也许这可能会有所帮助。 Instead of summarise and inner_join you can use mutate and have the new column time_spent be the time difference between that row's date and date when the person was hired.您可以使用mutate代替summariseinner_join并让新列time_spent成为该行的date与雇用该人员的date之间的时间差。

library(tidyverse)

d %>%
  mutate(date = as.Date(date, format = "%m/%d/%Y")) %>%
  arrange(Id, date) %>%
  group_by(Id) %>%
  mutate(time_spent = difftime(date, date[Action == "hire"], units = "days"))

Output Output

     Id Action    date       time_spent
  <dbl> <chr>     <date>     <drtn>    
1   101 hire      2017-02-22    0 days 
2   101 promoted  2018-11-11  627 days 
3   101 promoted  2019-03-22  758 days 
4   101 terminate 2020-12-11 1388 days 
5   103 hire      2016-02-23    0 days 
6   103 promoted  2018-01-12  689 days 
7   103 terminate 2019-03-21 1122 days

Edit : If you want to include NA when no "hire" date is available, you can filter to include Id that has any "hire" and then rejoin data again.编辑:如果您想在没有“雇用”日期时包含NA ,您可以filter以包含具有any “雇用”的Id ,然后再次重新加入数据。 Just make sure data frame has date in correct format first.只需确保数据框首先具有正确格式的date

d$date <- as.Date(d$date, format = "%m/%d/%Y")

d %>%
  arrange(Id, date) %>%
  group_by(Id) %>%
  filter(any(Action == "hire")) %>%
  mutate(time_spent = difftime(date, date[Action == "hire"], units = "days")) %>%
  right_join(d)

Data数据

d <- structure(list(Id = c(101, 101, 101, 101, 103, 103, 103), Action = c("hire", 
"terminate", "promoted", "promoted", "hire", "promoted", "terminate"
), date = c("02/22/2017", "12/11/2020", "11/11/2018", "03/22/2019", 
"02/23/2016", "01/12/2018", "03/21/2019")), class = "data.frame", row.names = c(NA, 
-7L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM