简体   繁体   English

使用警告按组计算两个日期之间的差异

[英]Calculating the difference between two dates by group with caveat

Data looks like this:数据如下所示:

df <- data.frame(
  id = c(283,994,294,294,1001,1001), 
  stint = c(1,1,1,2,1,2), 
  admit = c("2010-2-3","2011-2-4","2011-3-4","2012-4-1","2016-1-2","2017-2-3"),
  release = c("2011-2-3","2011-2-28","2011-4-1","2014-6-6","2017-2-1","2018-3-1")
)

okay so bear with me because I'm finding this kind of hard to articulate.好吧,请多多包涵,因为我发现这种语言很难表达。 I need to calculate the difference between the release date of the first stint and the admit date of the second stint by id.我需要通过 id 计算第一阶段的发布日期和第二阶段的录取日期之间的差异。 so that the difference, which I'm calling the "exposure" should look like this for the sample above所以对于上面的示例,我称之为“曝光”的差异应该是这样的

exposure=c(NA,NA,365,NA,2,NA)

So an NA will be returned if there is only 1 stint and if there are more than one stint the exposure period will be calculated using the previous release date and the current admit date.因此,如果只有 1 次,将返回 NA,如果有超过 1 次,则曝光期将使用之前的发布日期和当前的录取日期来计算。 So exposure for stint three will be admit of stint 3 - the release of stint 2.因此,第 3 阶段的曝光将被承认为第 3 阶段 - 第 2 阶段的释放。

Here is a dplyr approach.这是dplyr方法。 WE would find the value of admit ( release ) where stint is 2 (1), take the difference, and replace the first entry of exposure with that value for each group of id .我们将找到stint为 2 (1) 的admit ( release ) 的值,取其差,并用每组id的值替换 exposure 的第一个条目。

library(dplyr)

df %>% 
  mutate(
    across(c(admit, release), as.Date), 
    exposure = NA_integer_
  ) %>% 
  group_by(id) %>% 
  mutate(exposure = replace(
    exposure, 1L, 
    as.integer(admit[match(2, stint)] - release[match(1, stint)])
  ))
  

Output Output

# A tibble: 6 x 5
# Groups:   id [4]
     id stint admit      release    exposure
  <dbl> <dbl> <date>     <date>        <int>
1   283     1 2010-02-03 2011-02-03       NA
2   994     1 2011-02-04 2011-02-28       NA
3   294     1 2011-03-04 2011-04-01      366
4   294     2 2012-04-01 2014-06-06       NA
5  1001     1 2016-01-02 2017-02-01        2
6  1001     2 2017-02-03 2018-03-01       NA

You want to calculate the exposure if stint == 2, otherwise return NA.如果 stint == 2,您要计算曝光,否则返回 NA。 That can be accomplished with ifelse.这可以通过 ifelse 来实现。 However, you want the release to be from the previous release date.但是,您希望该版本来自之前的发布日期。 That can be done with lag.这可以通过滞后来完成。 But that will tie exposure values to the admit where exposure ==2, whereas you want exposure to be associated to the previous release used in the calculation.但这会将暴露值与暴露 ==2 的承认联系起来,而您希望暴露与计算中使用的先前版本相关联。 So, remove the first exposure value and add an NA at the end.因此,删除第一个曝光值并在最后添加一个 NA。

  df %>% 
    mutate(across(c(admit, release), as.Date), 
           exposure = c(ifelse(stint == 2, admit - lag(release), NA)[-1], NA))

Which yields哪个产量

    id stint      admit    release exposure
1  283     1 2010-02-03 2011-02-03       NA
2  994     1 2011-02-04 2011-02-28       NA
3  294     1 2011-03-04 2011-04-01      366
4  294     2 2012-04-01 2014-06-06       NA
5 1001     1 2016-01-02 2017-02-01        2
6 1001     2 2017-02-03 2018-03-01       NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM