简体   繁体   中英

Linear Interpolation over a vector in R

I have a vector of numbers with NA s and I want to create a function that will linearly interpolate the NA s given the number before and after the NA s... sometime the function will need to generate one number, others more than that

x <- c(1, NA, NA, 4, 5, 6, NA, 7, 8, NA, NA, NA, NA, 13, 14, 15)

first step is this even what you'd use to linearly interpolate the first two NA s between 1 and 4

approx(x = c(1,4), n = 2, method="linear")$x
  1. once you have a function you can apply it to the entire vector where n in the static example is the number of NA s to be filled and a, b are the values on either side of the NA?
interpolate <- function(a, b, n) {
  approx(x = c(a, b), n = n, method="linear")$x
}

IF this is even the right approach, how do I put this all together and apply this to the entire vector? How would others approach this problem? Any help appreciated!!!

Please find my solution written as the following four functions:

linear_int_NA <- function(vec)
{
  BM_list = Reduce(f = function(x, y) bitmagic(x), x = 0:(length(vec)-2), init = is.na(vec), accumulate = TRUE)
  names(BM_list) = 1:(length(BM_list))
  BM_proc = Filter(length, lapply(rev(BM_list), function(x) which(x)))
  blocks = readBM(BM_proc)
  replace_x_lint(vec, blocks)
}

bitmagic <- function(bin)
{
  bin[-length(bin)] & bin[-1]
}

readBM <- function(BM_proc)
{
  blocks = matrix(NA, nrow = 0, ncol = 2); i = 1
  while(i <= length(BM_proc))
  {
    if(length(BM_proc[[i]]) > 0)
    {
      row = c(BM_proc[[i]], BM_proc[[i]] + as.numeric(names(BM_proc)[i]) - 1)
      blocks = rbind(blocks, row)
      pos = sapply(BM_proc, function(x) !(x%in% (row[1]):row[2]))
      Names = names(BM_proc)
      BM_proc = lapply(1:length(BM_proc), function(x) BM_proc[[x]][pos[[x]]])
      names(BM_proc) = Names
      BM_proc = Filter(length, BM_proc)
    }
  }
  return(blocks)
}

replace_x_lint <- function(vec, blocks)
{
  
  l_int = lapply(1:nrow(blocks), function(x) approx(x = vec[c(blocks[x,1] - 1, blocks[x,2] + 1)], 
                                                    n = blocks[x,2] - blocks[x,1] + 3, method="linear")$y[-c(1, blocks[x,2] - blocks[x,1] + 3)])
  for(i in 1:nrow(blocks))
  {
    vec[blocks[i,1]:blocks[i,2]] <- l_int[[i]]
  }
  return(vec)
}

To see them in action, for example:

vec <- c(1, NA, NA, 4, 5, 6, NA, 7, 8, NA, NA, NA, NA, 13, 14, 15)
vec                # 1    NA   NA   4    5    6    NA   7    8    NA  NA   NA   NA   13   14   15
linear_int_NA(vec) # 1.0  2.0  3.0  4.0  5.0  6.0  6.5  7.0  8.0  9.0 10.0 11.0 12.0 13.0 14.0 15.0

This looks like the following:

plot(1:length(vec), linear_int_NA(vec), col = as.factor(is.na(vec)), pch = 19)

Output visualization

The logic behind is:

  1. is.na(vec), to convert into a binary chain.
  2. apply bitmagic recursively; from the output list (and being smart about it), it is possible to find the positions for the longest chains of 1s which correspond to the longest chains of NAs (so-called blocks).
  3. Having identified the blocks, linear interpolation is then applied to fill in the NAs.

Also, bare in mind that this function will fail if the first or last element of the vector is an NA (you cannot apply linear interpolation without a defined range).

Please cite if you employ the function in an academic environment. I am very happy to answer any questions you have about the code. Best,

Ventrilocus.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM