如何找到第一个非 NA 领先或滞后值？

Question

I have rows grouped by ID and I want to calculate how much time passes until the next event occurs (if it does occur for that ID).我有按 ID 分组的行，我想计算下一个事件发生前经过的时间（如果它确实发生在该 ID 上）。

Here is example code:这是示例代码：

year <- c(2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018)
id <- c(rep("A", times = 4), rep("B", times = 4), rep("C", times = 4))
event_date <- c(NA, 2016, NA, 2018, NA, NA, NA, NA, 2015, NA, NA, 2018)

df<- as.data.frame(cbind(id, year, event_date))
df
 id year event_date
1   A 2015       <NA>
2   A 2016       2016
3   A 2017       <NA>
4   A 2018       2018
5   B 2015       <NA>
6   B 2016       <NA>
7   B 2017       <NA>
8   B 2018       <NA>
9   C 2015       2015
10  C 2016       <NA>
11  C 2017       <NA>
12  C 2018       2018

Here is what I want the output to look like:这是我希望 output 的样子：

      id  year event_date  years_till_next_event
    1   A 2015       <NA>   1
    2   A 2016       2016   0
    3   A 2017       <NA>   1
    4   A 2018       2018   0
    5   B 2015       <NA>   <NA>
    6   B 2016       <NA>   <NA>
    7   B 2017       <NA>   <NA>
    8   B 2018       <NA>   <NA>
    9   C 2015       2015   0
    10  C 2016       <NA>   2
    11  C 2017       <NA>   1
    12  C 2018       2018   0

Person B does not have the event, so it is not calculated. B 人没有事件，所以不计算。 For the others, I want to calculate the difference between the leading event_date (ignoring NAs, if it exists) and the year.对于其他人，我想计算领先的 event_date （忽略 NA，如果存在）和年份之间的差异。

I want to calculate years_till_next_event such that 1) if there is an event_date for a row, event_date - year.我想计算years_till_next_event，这样1）如果一行有event_date，event_date - year。 2) If not, then return the first non-NA leading value - year. 2) 如果不是，则返回第一个非 NA 前导值 - 年份。 I'm having difficulty with the 2nd part of the logic, keeping in mind the event could occur not at all or every year, by ID.我对逻辑的第二部分有困难，请记住事件可能根本不会发生或每年都不会发生，按 ID。

Answer 1

Using zoo with dplyr将zoo与dplyr一起使用

library(dplyr)
library(zoo)
df %>%
   group_by(id) %>% 
   mutate(years_till_next_event  =  na.locf0(event_date, fromLast = TRUE) - year )

Answer 2

Here is a data.table option这是一个data.table选项

setDT(df)[, years_till_next_event := nafill(event_date, type = "nocb") - year, id]

which gives这使

    id year event_date years_till_next_event
 1:  A 2015         NA                     1
 2:  A 2016       2016                     0
 3:  A 2017         NA                     1
 4:  A 2018       2018                     0
 5:  B 2015         NA                    NA
 6:  B 2016         NA                    NA
 7:  B 2017         NA                    NA
 8:  B 2018         NA                    NA
 9:  C 2015       2015                     0
10:  C 2016         NA                     2
11:  C 2017         NA                     1
12:  C 2018       2018                     0

Answer 3

You can create a new column to assign a row number within each id if the value is not NA , fill the NA values from the next values and subtract the current row number from it.如果值不是NA ，您可以创建一个新列以在每个id中分配一个行号，从下一个值中fill NA值并从中减去当前行号。

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(years_till_next_event = replace(row_number(),is.na(event_date), NA)) %>%
  tidyr::fill(years_till_next_event, .direction = 'up') %>%
  mutate(years_till_next_event = years_till_next_event - row_number()) %>%
  ungroup

#    id     year event_date years_till_next_event
#   <chr> <dbl>      <dbl>                 <int>
# 1 A      2015         NA                     1
# 2 A      2016       2016                     0
# 3 A      2017         NA                     1
# 4 A      2018       2018                     0
# 5 B      2015         NA                    NA
# 6 B      2016         NA                    NA
# 7 B      2017         NA                    NA
# 8 B      2018         NA                    NA
# 9 C      2015       2015                     0
#10 C      2016         NA                     2
#11 C      2017         NA                     1
#12 C      2018       2018                     0

data数据

df <- data.frame(id, year, event_date)

如何找到第一个非 NA 领先或滞后值？

问题描述

3 个解决方案

解决方案1
2 2021-03-02 00:46:48

解决方案2
1 2021-03-02 00:44:02

解决方案3
0 2021-03-02 02:42:27

如何找到第一个非 NA 领先或滞后值？

问题描述

3 个解决方案

解决方案1 2 2021-03-02 00:46:48

解决方案2 1 2021-03-02 00:44:02

解决方案3 0 2021-03-02 02:42:27

解决方案1
2 2021-03-02 00:46:48

解决方案2
1 2021-03-02 00:44:02

解决方案3
0 2021-03-02 02:42:27