简体   繁体   English

如何找到第一个非 NA 领先或滞后值?

[英]How to find first non-NA leading or lagging value?

I have rows grouped by ID and I want to calculate how much time passes until the next event occurs (if it does occur for that ID).我有按 ID 分组的行,我想计算下一个事件发生前经过的时间(如果它确实发生在该 ID 上)。

Here is example code:这是示例代码:

year <- c(2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018)
id <- c(rep("A", times = 4), rep("B", times = 4), rep("C", times = 4))
event_date <- c(NA, 2016, NA, 2018, NA, NA, NA, NA, 2015, NA, NA, 2018)

df<- as.data.frame(cbind(id, year, event_date))
df
 id year event_date
1   A 2015       <NA>
2   A 2016       2016
3   A 2017       <NA>
4   A 2018       2018
5   B 2015       <NA>
6   B 2016       <NA>
7   B 2017       <NA>
8   B 2018       <NA>
9   C 2015       2015
10  C 2016       <NA>
11  C 2017       <NA>
12  C 2018       2018

Here is what I want the output to look like:这是我希望 output 的样子:

      id  year event_date  years_till_next_event
    1   A 2015       <NA>   1
    2   A 2016       2016   0
    3   A 2017       <NA>   1
    4   A 2018       2018   0
    5   B 2015       <NA>   <NA>
    6   B 2016       <NA>   <NA>
    7   B 2017       <NA>   <NA>
    8   B 2018       <NA>   <NA>
    9   C 2015       2015   0
    10  C 2016       <NA>   2
    11  C 2017       <NA>   1
    12  C 2018       2018   0

Person B does not have the event, so it is not calculated. B 人没有事件,所以不计算。 For the others, I want to calculate the difference between the leading event_date (ignoring NAs, if it exists) and the year.对于其他人,我想计算领先的 event_date (忽略 NA,如果存在)和年份之间的差异。

I want to calculate years_till_next_event such that 1) if there is an event_date for a row, event_date - year.我想计算years_till_next_event,这样1)如果一行有event_date,event_date - year。 2) If not, then return the first non-NA leading value - year. 2) 如果不是,则返回第一个非 NA 前导值 - 年份。 I'm having difficulty with the 2nd part of the logic, keeping in mind the event could occur not at all or every year, by ID.我对逻辑的第二部分有困难,请记住事件可能根本不会发生或每年都不会发生,按 ID。

Using zoo with dplyrzoodplyr一起使用

library(dplyr)
library(zoo)
df %>%
   group_by(id) %>% 
   mutate(years_till_next_event  =  na.locf0(event_date, fromLast = TRUE) - year )

Here is a data.table option这是一个data.table选项

setDT(df)[, years_till_next_event := nafill(event_date, type = "nocb") - year, id]

which gives这使

    id year event_date years_till_next_event
 1:  A 2015         NA                     1
 2:  A 2016       2016                     0
 3:  A 2017         NA                     1
 4:  A 2018       2018                     0
 5:  B 2015         NA                    NA
 6:  B 2016         NA                    NA
 7:  B 2017         NA                    NA
 8:  B 2018         NA                    NA
 9:  C 2015       2015                     0
10:  C 2016         NA                     2
11:  C 2017         NA                     1
12:  C 2018       2018                     0

You can create a new column to assign a row number within each id if the value is not NA , fill the NA values from the next values and subtract the current row number from it.如果值不是NA ,您可以创建一个新列以在每个id中分配一个行号,从下一个值中fill NA值并从中减去当前行号。

library(dplyr)

df %>%
  group_by(id) %>%
  mutate(years_till_next_event = replace(row_number(),is.na(event_date), NA)) %>%
  tidyr::fill(years_till_next_event, .direction = 'up') %>%
  mutate(years_till_next_event = years_till_next_event - row_number()) %>%
  ungroup

#    id     year event_date years_till_next_event
#   <chr> <dbl>      <dbl>                 <int>
# 1 A      2015         NA                     1
# 2 A      2016       2016                     0
# 3 A      2017         NA                     1
# 4 A      2018       2018                     0
# 5 B      2015         NA                    NA
# 6 B      2016         NA                    NA
# 7 B      2017         NA                    NA
# 8 B      2018         NA                    NA
# 9 C      2015       2015                     0
#10 C      2016         NA                     2
#11 C      2017         NA                     1
#12 C      2018       2018                     0

data数据

df <- data.frame(id, year, event_date)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM