简体   繁体   English

在 R 中使用 ROLLING 均值插补缺失值

[英]Impute missing values with ROLLING mean in R

I am new to R and struggling with a problem.我是 R 的新手并且正在努力解决问题。

I need a function to impute the missing values in a vector according to the mean value of the elements within a window of a given size.我需要一个函数来根据给定大小的窗口内元素的平均值来估算向量中的缺失值。

However, this window will move because, say my NA is in position 30, and my window size is 10, the mean should be computed for x[20:40] .但是,这个窗口会移动,因为假设我的NA在位置 30,我的窗口大小是 10,应该计算x[20:40]的平均值。 So for each found NA , the window-mean will be different.因此,对于每个找到的NA ,窗口均值会有所不同。

I have been trying this:我一直在尝试这个:

impute.to.window.mean <- function(x, window) {

  na.idx <- is.na(x)  #find missing values in x

  for (na in na.idx) {

    y <- (x[na]-window):(x[na]+window)
    na.idx[na] <- mean(y, na.rm = TRUE)
  }

  return(x)
}

but it is not correct and I don't know how to continue.但这不正确,我不知道如何继续。

You might want to consider using the imputeTS package.您可能需要考虑使用imputeTS包。 Here is an example of filling in values with a simple moving average and a window of 4:以下是使用简单移动平均线和 4 窗口填充值的示例:

x <- rnorm(100)
x[c(7, 21, 33)] <- NA

imputeTS::na.ma(x, k = 4, weighting = "simple")

Using zoo::rollapply this can be done in one statement.使用 zoo::rollapply 这可以在一个语句中完成。 We have used a window of length 5 (2 on either side of the current point) for this example:我们在这个例子中使用了一个长度为 5 的窗口(当前点的两侧各 2 个):

library(zoo)

x <- replace(1:20, c(4, 6, 10, 15), NA) # test data


rollapply(c(NA, NA, x, NA, NA), 5, 
    function(x) if (is.na(x[3])) mean(x, na.rm = TRUE) else x[3])

giving:给予:

 [1]  1.000000  2.000000  3.000000  3.333333  5.000000  6.666667  7.000000
 [8]  8.000000  9.000000 10.000000 11.000000 12.000000 13.000000 14.000000
[15] 15.000000 16.000000 17.000000 18.000000 19.000000 20.000000

with R base:使用 R 基:

df <- data.frame(x = sample(c(1:10,NA),1000, replace = T))
window <- 10

lapply(1:(nrow(df)-window), function(x) ifelse(is.na(df[x,'x']),mean(df[x:(x+10),'x'],na.rm=T),df[x,'x']))

Only difference I have that I now look forward for the values.我唯一的区别是我现在期待这些价值观。 But you can alter that to your specifications.但是您可以将其更改为您的规格。

Your indexing is a little off你的索引有点偏离

impute.to.window.mean <- function(x, window) {
  na.idx <- which(is.na(x))  #find missing values in x

  for (na in na.idx) {
    y <- sort(x[(na - window):(na + window)])
    x[na] <- mean(y)
  }

  return(x)
}

Walk through an example演练一个例子

set.seed(1)
x <- sample(10)
na <- 6
x[na] <- NA
# [1]  3  4  5  7  2 NA  9  6 10  1

window <- 3L

I used sort because it drops the NA s in one step;我使用sort是因为它一步删除NA you want the mean of this vector which are all the values that fall in window你想要这个向量的平均值,它是落在window中的所有值

sort(x[(na - window):(na + window)])
# [1]  2  5  6  7  9 10

mean(sort(x[(na - window):(na + window)]))
# [1] 6.5

Test your function now立即测试您的功能

impute.to.window.mean(x, window)
# [1]  3.0  4.0  5.0  7.0  2.0  6.5  9.0  6.0 10.0  1.0

Edit编辑

Actually, you should probably use实际上,您可能应该使用

y <- sort(x[pmax(1L, (na - window)):pmin(length(x), (na + window))])

instead for the case that an NA occurs at, say, 2, and your window is > 1相反,对于NA发生在 2 处的情况,并且您的窗口 > 1

## current version
impute.to.window.mean(x, 10)
# Error in x[(na - window):(na + window)] : 
#   only 0's may be mixed with negative subscripts

## version with pmax/pmin
impute.to.window.mean(x, 10)
# [1]  3.000000  4.000000  5.000000  7.000000  2.000000  5.222222  9.000000  6.000000 10.00000 1.000000

mean(sort(x))
# [1] 5.222222

impute.to.window.mean <- function(x, window) {
  na.idx <- which(is.na(x))  #find missing values in x

  for (na in na.idx) {
    # y <- sort(x[(na - window):(na + window)])
    y <- sort(x[pmax(1L, (na - window)):pmin(length(x), (na + window))])
    x[na] <- mean(y)
  }

  return(x)
}

The "Caret" package's preProcess function has a method called "knnImpute" that does exactly that. "Caret" 包的 preProcess 函数有一个名为 "knnImpute" 的方法,正是这样做的。 Give it a go.搏一搏。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM