简体   繁体   中英

Replace NA values if last and next non-NA value are the same

I am trying to fill missing data based on whether the previous and last NA value are the same. For example, this is the dummy dataset:

df <- data.frame(ID = c(rep(1, 6), rep(2, 6), rep(3, 6), rep(4, 6), rep(5, 6), rep(6, 6), 
                    rep(7, 6), rep(8, 6), rep(9, 6), rep(10, 6)), 
             with_missing = c("a", "a", NA, NA, "a", "a", 
                              "a", "a", NA, "b", "b", "b", 
                              "a", NA, NA, NA, "c", "c", 
                              "b", NA, "a", "a", "a", "a", 
                              "a", NA, NA, NA, NA, "a", 
                              "a", "a", NA, "b", "a", "a", 
                              "a", "a", NA, NA, "a", "a", 
                              "a", "a", NA, "b", "b", "b", 
                              "a", NA, NA, NA, "c", "c", 
                              "b", NA, "a", "a", "a", "a"),
             desired_result = c("a", "a", "a", "a", "a", "a", 
                                "a", "a", NA, "b", "b", "b", 
                                "a", NA, NA, NA, "c", "c", 
                                "b", NA, "a", "a", "a", "a", 
                                "a", "a", "a", "a", "a", "a", 
                                "a", "b", "b", "b", "a", "a", 
                                "a", "a", "a", "a", "a", "a", 
                                "a", "a", NA, "b", "b", "b", 
                                "a", NA, NA, NA, "c", "c", 
                                "b", NA, "a", "a", "a", "a")) 

So if there is a gap of four rows, for example, but the value before and after the gap are the same, then I want the gap to be filled with those same values; whereas if the values before and after the NA are different, I don't want to fill it. In addition, I need to group the data by the ID variable.

I've tried na.locf but I can't work out how to add in the condition of "if they're the same before and after NA".

Thanks.

You can fill forwards and backwards, then set the rows where they don't match to NA .

library(zoo)
library(dplyr)

df %>% 
  mutate_if(is.factor, as.character) %>% 
  group_by(ID) %>%
  mutate(result = na.locf(with_missing, fromLast = T),
         result = ifelse(result == na.locf(with_missing), result, NA))

#    ID with_missing desired_result result
# 1   1            a              a      a
# 2   1            a              a      a
# 3   1         <NA>              a      a
# 4   1         <NA>              a      a
# 5   1            a              a      a
# 6   1            a              a      a
# 7   2            a              a      a
# 8   2            a              a      a
# 9   2         <NA>           <NA>   <NA>
# 10  2            b              b      b
# 11  2            b              b      b
# 12  2            b              b      b
# 13  3            a              a      a
# 14  3         <NA>           <NA>   <NA>
# 15  3         <NA>           <NA>   <NA>
# 16  3         <NA>           <NA>   <NA>
# 17  3            c              c      c
# 18  3            c              c      c
# 19  4            b              b      b
# 20  4         <NA>           <NA>   <NA>
# 21  4            a              a      a
# 22  4            a              a      a
# 23  4            a              a      a
# 24  4            a              a      a
# 25  5            a              a      a
# 26  5         <NA>              a      a
# 27  5         <NA>              a      a
# 28  5         <NA>              a      a
# 29  5         <NA>              a      a
# 30  5            a              a      a
# 31  6            a              a      a
# 32  6            a              b      a
# 33  6         <NA>              b   <NA>
# 34  6            b              b      b
# 35  6            a              a      a
# 36  6            a              a      a
# 37  7            a              a      a
# 38  7            a              a      a
# 39  7         <NA>              a      a
# 40  7         <NA>              a      a
# 41  7            a              a      a
# 42  7            a              a      a
# 43  8            a              a      a
# 44  8            a              a      a
# 45  8         <NA>           <NA>   <NA>
# 46  8            b              b      b
# 47  8            b              b      b
# 48  8            b              b      b
# 49  9            a              a      a
# 50  9         <NA>           <NA>   <NA>
# 51  9         <NA>           <NA>   <NA>
# 52  9         <NA>           <NA>   <NA>
# 53  9            c              c      c
# 54  9            c              c      c
# 55 10            b              b      b
# 56 10         <NA>           <NA>   <NA>
# 57 10            a              a      a
# 58 10            a              a      a
# 59 10            a              a      a
# 60 10            a              a      a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM