[英]How to find first non-NA leading or lagging value?
I have rows grouped by ID and I want to calculate how much time passes until the next event occurs (if it does occur for that ID).我有按 ID 分组的行,我想计算下一个事件发生前经过的时间(如果它确实发生在该 ID 上)。
Here is example code:这是示例代码:
year <- c(2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018, 2015, 2016, 2017, 2018)
id <- c(rep("A", times = 4), rep("B", times = 4), rep("C", times = 4))
event_date <- c(NA, 2016, NA, 2018, NA, NA, NA, NA, 2015, NA, NA, 2018)
df<- as.data.frame(cbind(id, year, event_date))
df
id year event_date
1 A 2015 <NA>
2 A 2016 2016
3 A 2017 <NA>
4 A 2018 2018
5 B 2015 <NA>
6 B 2016 <NA>
7 B 2017 <NA>
8 B 2018 <NA>
9 C 2015 2015
10 C 2016 <NA>
11 C 2017 <NA>
12 C 2018 2018
Here is what I want the output to look like:这是我希望 output 的样子:
id year event_date years_till_next_event
1 A 2015 <NA> 1
2 A 2016 2016 0
3 A 2017 <NA> 1
4 A 2018 2018 0
5 B 2015 <NA> <NA>
6 B 2016 <NA> <NA>
7 B 2017 <NA> <NA>
8 B 2018 <NA> <NA>
9 C 2015 2015 0
10 C 2016 <NA> 2
11 C 2017 <NA> 1
12 C 2018 2018 0
Person B does not have the event, so it is not calculated. B 人没有事件,所以不计算。 For the others, I want to calculate the difference between the leading event_date (ignoring NAs, if it exists) and the year.对于其他人,我想计算领先的 event_date (忽略 NA,如果存在)和年份之间的差异。
I want to calculate years_till_next_event such that 1) if there is an event_date for a row, event_date - year.我想计算years_till_next_event,这样1)如果一行有event_date,event_date - year。 2) If not, then return the first non-NA leading value - year. 2) 如果不是,则返回第一个非 NA 前导值 - 年份。 I'm having difficulty with the 2nd part of the logic, keeping in mind the event could occur not at all or every year, by ID.我对逻辑的第二部分有困难,请记住事件可能根本不会发生或每年都不会发生,按 ID。
Using zoo
with dplyr
将zoo
与dplyr
一起使用
library(dplyr)
library(zoo)
df %>%
group_by(id) %>%
mutate(years_till_next_event = na.locf0(event_date, fromLast = TRUE) - year )
Here is a data.table
option这是一个data.table
选项
setDT(df)[, years_till_next_event := nafill(event_date, type = "nocb") - year, id]
which gives这使
id year event_date years_till_next_event
1: A 2015 NA 1
2: A 2016 2016 0
3: A 2017 NA 1
4: A 2018 2018 0
5: B 2015 NA NA
6: B 2016 NA NA
7: B 2017 NA NA
8: B 2018 NA NA
9: C 2015 2015 0
10: C 2016 NA 2
11: C 2017 NA 1
12: C 2018 2018 0
You can create a new column to assign a row number within each id
if the value is not NA
, fill
the NA
values from the next values and subtract the current row number from it.如果值不是NA
,您可以创建一个新列以在每个id
中分配一个行号,从下一个值中fill
NA
值并从中减去当前行号。
library(dplyr)
df %>%
group_by(id) %>%
mutate(years_till_next_event = replace(row_number(),is.na(event_date), NA)) %>%
tidyr::fill(years_till_next_event, .direction = 'up') %>%
mutate(years_till_next_event = years_till_next_event - row_number()) %>%
ungroup
# id year event_date years_till_next_event
# <chr> <dbl> <dbl> <int>
# 1 A 2015 NA 1
# 2 A 2016 2016 0
# 3 A 2017 NA 1
# 4 A 2018 2018 0
# 5 B 2015 NA NA
# 6 B 2016 NA NA
# 7 B 2017 NA NA
# 8 B 2018 NA NA
# 9 C 2015 2015 0
#10 C 2016 NA 2
#11 C 2017 NA 1
#12 C 2018 2018 0
data数据
df <- data.frame(id, year, event_date)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.