简体   繁体   中英

Fill missing values with a calculated next value

I have a data frame with a date column with some missing values:

my_df <- data.frame(date = as.Date(c("2020-07-01", NA, NA, NA, "2022-07-01", "2023-07-01")))
my_df
#         date
# 1 2020-07-01
# 2       <NA>
# 3       <NA>
# 4       <NA>
# 5 2022-07-01 # NAs to be replaced with 2022-07-01 minus one year
# 6 2023-07-01

I want to fill in the NA dates thus:

  • fill with next non- NA , upwards
  • subtract 1 year from the filled values.

Desired result:

data.frame(date = as.Date(c("2020-07-01", "2021-07-01", "2021-07-01", "2021-07-01", "2022-07-01", "2023-07-01")))

#         date
# 1 2020-07-01
# 2 2021-07-01 
# 3 2021-07-01 
# 4 2021-07-01 
# 5 2022-07-01
# 6 2023-07-01

I like tidyverse so I'm hoping to use fill() and something like %m-% years(1) from lubridate. My attempt:

my_df <- my_df %>%
  mutate(date2 = date %m-% years(1)) %>%
  fill(date2, .direction = "up") %>%
  mutate(date = if_else(is.na(date), date2, date)) %>%
  select(-date2)

seems to work, but is there a more direct method?

An option with zoo :

library(dplyr)
library(lubridate)
library(zoo)

my_df %>% mutate(date = na.locf(date, fromLast = TRUE) %m-% years(1 * is.na(date)))

Output:

        date
1 2020-07-01
2 2021-07-01
3 2021-07-01
4 2021-07-01
5 2022-07-01
6 2023-07-01

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM