简体   繁体   English

在 R 中使用循环用均值替换 NA

[英]Replacing NA with mean using loop in R

I have to solve this problem using loop in R (I am aware that you can do it much more easily without loops, but it is for school...).我必须在 R 中使用循环来解决这个问题(我知道没有循环你可以更容易地做到这一点,但它是为了学校......)。

So I have vector with NAs like this:所以我有这样的 NA 向量:

trades<-sample(1:500,150,T)
trades<-trades[order(trades)]
trades[sample(10:140,25)]<-NA

and I have to create a FOR loop that will replace NAs with mean from 2 numbers before the NA and 2 numbers that come after the NA.我必须创建一个 FOR 循环,它将用 NA 之前的 2 个数字和 NA 之后的 2 个数字的平均值替换 NA。

This I am able to do, with loop like this:这是我能够做到的,循环如下:

for (i in 1:length(trades)) {
  if (is.na(trades[i])==T) {

      trades[i] <- mean(c(trades[c(i-1:2)], trades[c(i+1:2)]), na.rm = T)
     }
  }

But there is another part to the homework.但是作业还有另一部分。 If there is NA within the 2 previous or 2 following numbers, then you have to replace the NA with mean from 4 previous numbers and 4 following numbers (I presume with removing the NAs).如果前面的 2 个或后面的 2 个数字中有 NA,那么您必须用 4 个前面的数字和后面的 4 个数字的平均值替换 NA(我假设删除了 NA)。 But I just am not able to crack it... I have the best results with this loop:但我就是无法破解它......我用这个循环获得了最好的结果:

for (i in 1:length(trades)) {
  if (is.na(trades[i])==T && is.na(trades[c(i-1:2)]==T || is.na(trades[c(i+1:2)]==T))) {
   trades[i] <- mean(c(trades[c(i-1:4)], trades[c(i+1:4)]), na.rm = T)
  }else if (is.na(trades[i])==T){
    trades[i] <- mean(c(trades[c(i-1:2)], trades[c(i+1:2)]))
  }

}

But it still misses some NAs.但它仍然错过了一些 NA。

Thank you for your help in advance.提前谢谢你的帮助。

We can use na.approx from zoo我们可以使用zoo na.approx

library(zoo)
na.approx(trades)

So it seems that posting to StackOverflow helped me solve the problem.所以似乎发布到 StackOverflow 帮助我解决了这个问题。

trades<-sample(1:500,25,T)
trades<-trades[order(trades)]
trades[sample(1:25,5)]<-NA

which gives us:这给了我们:

[1]  NA  20  24  30  NA  77 188 217 238 252 264 273 296  NA 326 346 362 368  NA  NA 432 451 465 465 490

and if you run this loop:如果你运行这个循环:

for (i in 1:length(trades)) {
  if (is.na(trades[i])== T) {
    test1 <- c(trades[c(i+1:2)])
       if (any(is.na(test1))==T) {
        test2 <- c(trades[abs(c(i-1:4))], trades[c(i+1:4)])
        trades[i] <- round(mean(test2, na.rm = T),0)
      }else {
        test3 <- c(trades[abs(c(i-1:2))], trades[c(i+1:2)])
        trades[i] <- round(mean(test3, na.rm = T),0)
      }
    }
  }

it changes the NAs to this:它将 NA 更改为:

[1]  22  20  24  30  80  77 188 217 238 252 264 273 296 310 326 346 362 368 387 410 432 451 465 465 490

So it works pretty much as expected.所以它几乎按预期工作。

Thank you for all your help.谢谢你的帮助。

Here is another solution using a loop.这是使用循环的另一种解决方案。 I did shortcut some code by using lead and lag from dplyr .我通过使用dplyr leadlag dplyr一些代码。 First we use 2 recursive functions to calculate the lead and lag sums.首先,我们使用 2 个递归函数来计算超前和滞后总和。 Then we use conditional statements to determine if there are any missing data.然后我们使用条件语句来确定是否有任何缺失的数据。 Lastly, we fill the missing data using either the output of the recursive or the sum of the previous and following 4 (with NA removed).最后,我们使用递归的输出或前后 4 的总和(去除 NA)来填充缺失的数据。 I would note that this is not the way that I would go about this issue, but I tried it out with a loop as requested.我会注意到这不是我解决这个问题的方式,但我按照要求用循环进行了尝试。

library(dplyr)

r.lag <- function(x, n){
  if (n == 1) return(lag(x = x, n = 1))
  else return( lag(x = x, n = n) +  r.lag(x = x, n = n-1))
}

r.lead <- function(x, n){
  if (n == 1) return(lead(x = x, n = 1))
  else return( lead(x = x, n = n) +  r.lead(x = x, n = n-1))
}

lead.vec <- r.lead(trades, 2)
lag.vec <- r.lag(trades, 2)

output <- vector(length = length(trades))
for(i in 1:length(trades)){
  if(!is.na(trades[[i]])){
    output[[i]] <- trades[[i]]
  }
  else if(is.na(trades[[i]]) & !is.na(lead.vec[[i]]) & !is.na(lag.vec[[i]])){
    output[[i]] <- (lead.vec[[i]] + lag.vec[[i]])/4
  }
  else
    output[[i]] <- mean(
      c(trades[[i-4]], trades[[i-3]], trades[[i-2]], trades[[i-1]], 
        trades[[i+4]], trades[[i+3]], trades[[i+2]], trades[[i+1]]),
      na.rm = T
      )
}

tibble(
  original = trades,
  filled = output
)
#> # A tibble: 150 x 2
#>    original filled
#>       <int>  <dbl>
#>  1        7      7
#>  2        7      7
#>  3       12     12
#>  4       18     18
#>  5       30     30
#>  6       31     31
#>  7       36     36
#>  8       NA     40
#>  9       43     43
#> 10       50     50
#> # … with 140 more rows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM