简体   繁体   English

用相邻值替换 POSIXct 系列中的 NA

[英]Replace NA in a POSIXct serie by adjacent values

I've a data frame like this (but with much more rows):我有一个这样的数据框(但有更多的行):

  individ_id           date_time               begin           end
1: NOS_4214433 2017-11-22 09:01:49 2017-11-21 11:54:59 2017-11-22 09:07:27
2: NOS_4214433 2017-11-22 09:06:49 2017-11-21 11:54:59 2017-11-22 09:07:27
3: NOS_4214433 2017-11-22 09:11:49                <NA>                <NA>
4: NOS_4214433 2017-11-22 09:16:49                <NA>                <NA>
5: NOS_4214433 2018-01-24 12:12:18 2018-01-24 12:08:28 2018-01-25 09:33:10

and I want to fill the NA in the begin and end columns with the first NA date_time value for the 'begin' column and the last date_time NA value for the 'end' column like this:我想,以填补NA在开始和结束列与第一NA为“开始”栏和最后DATE_TIME DATE_TIME值NA像这样的“结束”列的值:

    individ_id           date_time               begin                 end
1: NOS_4214433 2017-11-22 09:01:49 2017-11-21 11:54:59 2017-11-22 09:07:27
2: NOS_4214433 2017-11-22 09:06:49 2017-11-21 11:54:59 2017-11-22 09:07:27
3: NOS_4214433 2017-11-22 09:11:49 2017-11-22 09:11:49 2017-11-22 09:16:49
4: NOS_4214433 2017-11-22 09:16:49 2017-11-22 09:11:49 2017-11-22 09:16:49
5: NOS_4214433 2018-01-24 12:12:18 2018-01-24 12:08:28 2018-01-25 09:33:10

All the date-time data are in the POSIX format and I want to keep it that way.所有日期时间数据都采用 POSIX 格式,我想保持这种格式。 Does anyone have an idea to solve that issue?有没有人有解决这个问题的想法?

I believe this solves your problem:我相信这可以解决您的问题:

library(tidyr)

na_inds_begin <- as.numeric((is.na(df$begin)))
na_inds_end <- as.numeric((is.na(df$end)))

na_diffs_lead <- c(0, diff(na_inds_begin))
na_diffs_lag <- c(diff(na_inds_end), 0)

first_nas <- na_inds_begin == 1 & na_diffs_lead > 0
first_nas[1] <- na_inds_begin[1] == 1

last_nas <- na_inds_end == 1 & na_diffs_lag < 0 
last_nas[length(last_nas)] <- na_inds_end[length(na_inds_end)] == 1

df$begin[first_nas] <- df$date_time[first_nas]
df$end[last_nas] <- df$date_time[last_nas]

df$begin[first_nas] <- df$date_time[first_nas]
df$end[last_nas] <- df$date_time[last_nas]

df <-
  df %>%
  fill(begin, .direction = "down") %>%
  fill(end, .direction = "up")

First, we find the first NA in each group of NA s in begin , and the last NA in each group of NA s in end .首先,我们发现的第NA各组的NA以s begin ,最后NA各组的NA以秒end We also need to handle cases where the first element in begin or the last element in end are NA .我们还需要处理begin中的第一个元素或end中的最后一个元素是NA Then we replace only those elements with the desired replacements.然后我们只用所需的替换来替换那些元素。 Finally, we fill the rest of each group downward for begin and upward for end .最后,我们将每组的其余部分向下填充为begin ,向上填充为end

This is the result:这是结果:

> df
# A tibble: 5 x 4
  individ_id  date_time           begin               end                
  <chr>       <dttm>              <dttm>              <dttm>             
1 NOS_4214433 2017-11-22 09:01:49 2017-11-21 11:54:59 2017-11-22 09:07:27
2 NOS_4214433 2017-11-22 09:06:49 2017-11-21 11:54:59 2017-11-22 09:07:27
3 NOS_4214433 2017-11-22 09:11:49 2017-11-22 09:11:49 2017-11-22 09:16:49
4 NOS_4214433 2017-11-22 09:16:49 2017-11-22 09:11:49 2017-11-22 09:16:49
5 NOS_4214433 2018-01-24 12:12:18 2018-01-24 12:08:28 2018-01-25 09:33:10

Edit: I updated the example code to be robust to the case where begin and end have different NA indices or the first/last elements are NA .编辑:我更新了示例代码以适应beginend具有不同NA索引或第一个/最后一个元素是NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM