简体   繁体   中英

Fill empty cells between two values in column with last non empty cell and next non empty cell in R

I need to loop over IDs in a dataframe to fill NA values in a column by attributing empty cells evenly between the last and first filled entry outside of the NA cells.

ID     Value    X     Y
1          A      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1          B      x     y
2          C      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2          D      x     y

Which should be filled to this:

ID     Value    X     Y
1          A      x     y 
1          A      x     y 
1          A      x     y 
1          B      x     y 
1          B      x     y 
1          B      x     y 
1          B      x     y
2          C      x     y
2          C      x     y
2          C      x     y
2          D      x     y
2          D      x     y
2          D      x     y

In case of 2n NA values between observations, n is attributed to the last and n to the next. In case of 2n+1 values, n is attributed to the last and n+1 to the next.

I know I need to use na.locf from the zoo package which works well with a large database for filling in empty values based on the last non-empty cell, along with the fromLast argument to perform "next observation carried backwards". I cannot however structure a loop to account for an even or odd number of NA values, and use both of these together.

Using the tidyverse package,

> library(tidyr)
> library(dplyr)
> df %>% dplyr::group_by(test$id) %>% fill(Value, .direction ="downup") %>% dplyr::ungroup()

This fills in NA values in both directions but does not account for different border values for NA cells in a group.

Define interp which replaces each successive non-NA with successive integers, applies na.appro9x, rounds and replaces the resulting integers with the original values.

library(zoo)

interp <- function(x) {
   x0 <- ifelse(is.na(x), NA, cumsum(!is.na(x)))
   xx <- na.approx(x0, rule = 2)
   na.omit(x)[round(xx)]
}
transform(DF, Value = interp(Value))

giving:

   ID Value X Y
1   1     A x y
2   1     A x y
3   1     A x y
4   1     B x y
5   1     B x y
6   1     B x y
7   1     B x y
8   2     C x y
9   2     C x y
10  2     C x y
11  2     D x y
12  2     D x y
13  2     D x y

Note

It is assumed that the input is the following, shown in reproducible form.

Lines <- "ID     Value    X     Y
1          A      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1         NA      x     y 
1          B      x     y
2          C      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2         NA      x     y
2          D      x     y"
DF <- read.table(text = Lines, header = TRUE)

I guess the simplest way is to use the function: na.locf: Last Observation Carried Forward if we are in zoo/time-series.

see: https://www.rdocumentation.org/packages/zoo/versions/1.8-9/topics/na.locf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM