简体   繁体   中英

Replace NA in a POSIXct serie by adjacent values

I've a data frame like this (but with much more rows):

  individ_id           date_time               begin           end
1: NOS_4214433 2017-11-22 09:01:49 2017-11-21 11:54:59 2017-11-22 09:07:27
2: NOS_4214433 2017-11-22 09:06:49 2017-11-21 11:54:59 2017-11-22 09:07:27
3: NOS_4214433 2017-11-22 09:11:49                <NA>                <NA>
4: NOS_4214433 2017-11-22 09:16:49                <NA>                <NA>
5: NOS_4214433 2018-01-24 12:12:18 2018-01-24 12:08:28 2018-01-25 09:33:10

and I want to fill the NA in the begin and end columns with the first NA date_time value for the 'begin' column and the last date_time NA value for the 'end' column like this:

    individ_id           date_time               begin                 end
1: NOS_4214433 2017-11-22 09:01:49 2017-11-21 11:54:59 2017-11-22 09:07:27
2: NOS_4214433 2017-11-22 09:06:49 2017-11-21 11:54:59 2017-11-22 09:07:27
3: NOS_4214433 2017-11-22 09:11:49 2017-11-22 09:11:49 2017-11-22 09:16:49
4: NOS_4214433 2017-11-22 09:16:49 2017-11-22 09:11:49 2017-11-22 09:16:49
5: NOS_4214433 2018-01-24 12:12:18 2018-01-24 12:08:28 2018-01-25 09:33:10

All the date-time data are in the POSIX format and I want to keep it that way. Does anyone have an idea to solve that issue?

I believe this solves your problem:

library(tidyr)

na_inds_begin <- as.numeric((is.na(df$begin)))
na_inds_end <- as.numeric((is.na(df$end)))

na_diffs_lead <- c(0, diff(na_inds_begin))
na_diffs_lag <- c(diff(na_inds_end), 0)

first_nas <- na_inds_begin == 1 & na_diffs_lead > 0
first_nas[1] <- na_inds_begin[1] == 1

last_nas <- na_inds_end == 1 & na_diffs_lag < 0 
last_nas[length(last_nas)] <- na_inds_end[length(na_inds_end)] == 1

df$begin[first_nas] <- df$date_time[first_nas]
df$end[last_nas] <- df$date_time[last_nas]

df$begin[first_nas] <- df$date_time[first_nas]
df$end[last_nas] <- df$date_time[last_nas]

df <-
  df %>%
  fill(begin, .direction = "down") %>%
  fill(end, .direction = "up")

First, we find the first NA in each group of NA s in begin , and the last NA in each group of NA s in end . We also need to handle cases where the first element in begin or the last element in end are NA . Then we replace only those elements with the desired replacements. Finally, we fill the rest of each group downward for begin and upward for end .

This is the result:

> df
# A tibble: 5 x 4
  individ_id  date_time           begin               end                
  <chr>       <dttm>              <dttm>              <dttm>             
1 NOS_4214433 2017-11-22 09:01:49 2017-11-21 11:54:59 2017-11-22 09:07:27
2 NOS_4214433 2017-11-22 09:06:49 2017-11-21 11:54:59 2017-11-22 09:07:27
3 NOS_4214433 2017-11-22 09:11:49 2017-11-22 09:11:49 2017-11-22 09:16:49
4 NOS_4214433 2017-11-22 09:16:49 2017-11-22 09:11:49 2017-11-22 09:16:49
5 NOS_4214433 2018-01-24 12:12:18 2018-01-24 12:08:28 2018-01-25 09:33:10

Edit: I updated the example code to be robust to the case where begin and end have different NA indices or the first/last elements are NA .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM