如何用邻居填充矩阵中的na？ R

Question

this my first question and I hope to collaborate in the community.这是我的第一个问题，我希望在社区中进行合作。 I am in a project in which I must fill the NA values with the average of their neighbors from a matrix of ncol = 10 and nrow = 10. I have developed the following code however it is very computationally inefficient:我在一个项目中，我必须用 ncol = 10 和 nrow = 10 的矩阵中的邻居的平均值填充 NA 值。我开发了以下代码，但是它的计算效率非常低：

Code代码

get_neighbor <- function(matrix, x=1,y=1){

  z <- complex(real = rep(1:nrow(matrix), ncol(matrix)),
               imaginary = rep(1:ncol(matrix), each = nrow(matrix)))
  
  lookup <- lapply(seq_along(z), function(x){
    # calcular la distantancia 
    dist <- which(abs(z - z[x]) < 2)
    # sacar el elemento x del vecindario 
    dist[which(dist != x)]
  })
  index <- (y-1)*(nrow(matrix))+x
  matrix[lookup[[index]]]
  
}

nn_mean <- function(a){
  if(sum(is.na(a))!=ncol(a)*nrow(a)){
    C <- permutations(2, 2, c(1,dim(a)[1]), repeats.allowed = T)
    Borders <- data.frame(matrix(data = 0, ncol = 2, nrow = nrow(a)*2 + ncol(a)*2 - 4))
    Borders[1:nrow(a), 1] <- 1:nrow(a); Borders[1:nrow(a), 2] <- 1
    for(i in 2:(ncol(a)-1)){
      Borders[i + nrow(a) - 1, 2] <- i; Borders[i + 2*(nrow(a) - 1) - 1, 2] <- i
      Borders[i + nrow(a) - 1, 1] <- 1; Borders[i + 2*(nrow(a) - 1) - 1, 1] <- nrow(a)
    }
    Borders[1:ncol(a) + 3*(nrow(a))-4, 2] <- ncol(a)
    Borders[1:ncol(a) + 3*(nrow(a))-4, 1] <- 1:ncol(a)
    
    id <- which(is.na(a), arr.ind = T)
    id <- data.frame(cbind(id, rep(0, nrow(id))))
    
    while(nrow(id)!=0){
      
      for(i in 1:nrow(id)){
        id[i,3] <- sum(is.na(get_neighbor(a, id[i, 1], id[i, 2])))
      }
      
      max_na <- max(id[, 3])
      for(i in 1:(nrow(a)*2 + ncol(a)*2 - 4)){
        if(is.na(a[Borders[i, 1], Borders[i, 2]]) & sum(is.na(get_neighbor(a, Borders[i, 1], Borders[i, 2]))) == 5){
          index <- which(id[,1] == Borders[i, 1] & id[,2] == Borders[i, 2])
          id[index, 3] <- max_na +1
        }
      }
      
      for(i in 1:4){
        if(is.na(a[C[i,1], C[i,2]]) & sum(is.na(get_neighbor(a, C[i, 1], C[i, 2]))) == 3){
          index <- which(id[,1] == C[i, 1] & id[,2] == C[i, 2])
          id[index, 3] <- max_na +1
        }
      }
      
      id <- id[order(id[,3]),]
      index <- which(id[,3]== min(id[,3]))
      for(i in 1:length(index)){
        a[id[i, 1], id[i, 2]] <- mean(get_neighbor(a, id[i, 1], id[i, 2]), na.rm = T)
        if(is.nan(a[id[i, 1], id[i, 2]])){a[id[i, 1], id[i, 2]] <- NA}
      }
      #print(a)
      id <- which(is.na(a), arr.ind = T)
      id <- data.frame(cbind(id, rep(0, nrow(id))))
      
    }
  }
  return(a)
}

example例子

a <- matrix(data = runif(100, 0, 10), ncol = 10, nrow = 10)
a[a<2] <- NA  

a
          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
 [1,] 2.313512       NA 5.311104 2.832978 9.917106 2.734799 7.309386       NA 4.794476 6.479147
 [2,] 8.855676 7.555101 8.369477 6.346744 7.727896       NA 9.019421 5.061894 9.116066 6.732293
 [3,] 2.948539 7.440258 6.918414 2.155361 3.511407 5.601253       NA 6.561557 9.543535 4.082592
 [4,] 8.455382 9.169974       NA 4.978224 6.202393       NA 9.435753 9.411371       NA 2.128417
 [5,] 7.744456 3.333072 6.975128 5.876849 4.044768 2.948399 5.067653       NA 6.039412 7.350782
 [6,] 8.793417 9.683755 8.053603 7.406450 6.348171 3.122946 9.378282 5.808363 7.923061 6.415419
 [7,] 4.759612 3.431247 4.123641 6.899569 4.464683 6.588431 5.985248 7.962148 6.668238 4.503556
 [8,] 5.992242       NA 7.099657 6.446650       NA 8.448873 5.884961       NA 2.209453 8.103988
 [9,] 6.383036       NA       NA 5.499157 6.972433 3.129470 3.284383 9.150565 8.484186 4.672878
[10,]       NA       NA 4.258936       NA 9.015525       NA       NA       NA       NA 6.639832

nn_mean(a)

          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
 [1,] 2.313512 6.480974 5.311104 2.832978 9.917106 2.734799 7.309386 7.060248 4.794476 6.479147
 [2,] 8.855676 7.555101 8.369477 6.346744 7.727896 6.545895 9.019421 5.061894 9.116066 6.732293
 [3,] 2.948539 7.440258 6.918414 2.155361 3.511407 5.601253 7.111993 6.561557 9.543535 4.082592
 [4,] 8.455382 9.169974 5.855910 4.978224 6.202393 5.258804 9.435753 9.411371 6.587278 2.128417
 [5,] 7.744456 3.333072 6.975128 5.876849 4.044768 2.948399 5.067653 7.580556 6.039412 7.350782
 [6,] 8.793417 9.683755 8.053603 7.406450 6.348171 3.122946 9.378282 5.808363 7.923061 6.415419
 [7,] 4.759612 3.431247 4.123641 6.899569 4.464683 6.588431 5.985248 7.962148 6.668238 4.503556
 [8,] 5.992242 5.298239 7.099657 6.446650 6.056158 8.448873 5.884961 6.203648 2.209453 8.103988
 [9,] 6.383036 5.902524 5.834195 5.499157 6.972433 3.129470 3.284383 9.150565 8.484186 4.672878
[10,] 6.383036 5.731883 4.258936 6.436513 9.015525 5.600453 5.291218 6.689444 7.236865 6.639832

some idea or a function that is efficient?一些想法或有效的function？

Answer 1

This can be written in a even short and fast way in R:这可以在 R 中以更短且快速的方式编写：

nn_impute <- function(dat){
  idx <- which(is.na(dat), TRUE)
  impute <- function(x){
    y <- expand.grid(x[1] + c(-1,0,1), x[2] + c(-1,0,1))
    z <- !(y == 0 | y > nrow(dat) | y> ncol(dat))
    mean(dat[as.matrix(y[rowSums(z) == 2,])], na.rm = TRUE)
  }
  dat[idx] <- apply(idx, 1, impute)
  dat
}

nn_impute(a) ## Returns the filled in values

This code is around 38X faster than the provided code此代码比提供的代码快约 38 倍

Answer 2

I kept your code above, apart from nn_mean function, I defined an nn_mean2 function.我将您的代码保留在上面，除了 nn_mean function 之外，我还定义了一个 nn_mean2 function。 It works about 20x times faster than yours, but gives different results.它的工作速度比你的快 20 倍，但结果不同。 Since I don't know why you wrote your function the way you did, ie your requirements, I can't tell why my approach is not suitable.由于我不知道您为什么按照您的方式编写 function，即您的要求，所以我不知道为什么我的方法不适合。 but it is way faster.但它要快得多。 It naively uses your get_neighbour definition, averages the found neighbour values and substitutes them into the holes.它天真地使用您的 get_neighbour 定义，平均找到的邻居值并将它们替换到孔中。 You must be needing to do something else or our results would have matched I would have thought.您一定需要做其他事情，否则我们的结果会符合我的预期。 Here it is for consideration这里供参考



nn_mean2 <- function(a){
res2 <- a
# get the missings
list_of_missing <- which(is.na(a))
list_of_missing_df <- data.frame(which(is.na(a),arr.ind = TRUE))
list_of_missing_df$missing_fills <- purrr::map_dbl(seq_len(nrow(list_of_missing_df)),
    ~{
      mean(get_neighbor(a,
                   x=list_of_missing_df$row[.x],
                   y=list_of_missing_df$col[.x]),
           na.rm=TRUE)
    })
res2[list_of_missing] <- list_of_missing_df$missing_fills
res2
}
res2 <- nn_mean2(a)
microbenchmark::microbenchmark(n1 = nn_mean(a),
            n2=nn_mean2(a))
# A tibble: 2 x 13
  expression      min   median `itr/sec` mem_al~1 gc/se~2 n_itr  n_gc total~3 result memory    
  <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t> <list> <list>    
1 n1            380ms    425ms      2.35 107.34MB    4.71     2     4   850ms <NULL> <Rprofmem>
2 n2             17ms   20.1ms     39.9    6.91MB    5.99    20     3   501ms <NULL> <Rprofmem>
# see the difference in values though 
res1 - res2

如何用邻居填充矩阵中的na？ R

问题描述

Code代码

example例子

2 个解决方案

解决方案1
2 已采纳 2022-08-22 19:24:37

解决方案2
0 2022-08-22 19:16:15

如何用邻居填充矩阵中的na？ R

问题描述

Code代码

example例子

2 个解决方案

解决方案1 2 已采纳 2022-08-22 19:24:37

解决方案2 0 2022-08-22 19:16:15

解决方案1
2 已采纳 2022-08-22 19:24:37

解决方案2
0 2022-08-22 19:16:15