在网格中识别至少 N 个符合特定条件的连续单元格

Question

I have an X by Y grid with cells containing 1 if a certain criteria is met or 0 if it is not.我有一个 X by Y 网格，如果满足某个条件，则单元格包含 1，否则为 0。 Now I want to identify features in the grid where there are at least N contiguous cells containing a 1. Contiguous cells can be adjacent side by side, or adjacent diagonally.现在我想识别网格中至少有 N 个包含 1 的连续单元格的特征。连续单元格可以并排相邻，也可以对角相邻。 I made a picture to illustrate the problem (see link), with N = 5. For clarity I omitted marking the 0s, and they are in the unmarked cells.我制作了一张图片来说明问题（见链接），N = 5。为清楚起见，我省略了标记 0，它们位于未标记的单元格中。 Red 1s belong to features I want to identify, and black 1s do not.红色 1 属于我要识别的特征，黑色 1 不属于。 The desired result would be as shown in the picture, but with all the black 1s changed to 0s.所需的结果将如图所示，但所有黑色的 1 都变为 0。 I use R, so solutions using that language would be thoroughly appreciated, but I'll happily settle for others.我使用 R，因此使用该语言的解决方案将不胜感激，但我很乐意接受其他人。 I couldn't find anything in the R libraries (such as rgeos) specifically, but maybe I'm missing something.我在 R 库（例如 rgeos）中找不到任何东西，但也许我遗漏了一些东西。 Any help appreciated, thanks!任何帮助表示赞赏，谢谢！

N = 5 的特征识别问题说明

Here is a small reproducible example created这是一个创建的可重复的小示例

input.mat <- structure(c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
                         0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
                         1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 
                         0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
                         1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                         0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 
                         0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 
                         1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 
                         1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 
                         0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
                         0L, 1L, 1L, 1L), .Dim = c(15L, 15L), .Dimnames = list(NULL, NULL))

input.mat
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
 [1,]    1    1    0    0    0    0    0    0    0     0     0     0     1     0     0
 [2,]    1    1    0    0    1    1    1    0    0     1     0     0     0     1     0
 [3,]    0    0    1    0    0    0    0    0    0     1     1     0     1     0     1
 [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0     1     0
 [5,]    0    0    0    0    0    0    0    0    0     0     0     1     0     0     0
 [6,]    1    0    0    0    0    0    0    0    0     0     1     0     1     1     0
 [7,]    1    1    0    0    0    0    0    0    0     0     0     1     0     0     0
 [8,]    1    1    0    0    0    0    0    0    0     0     0     0     0     0     0
 [9,]    1    0    0    0    0    1    0    1    0     0     0     1     1     1     0
[10,]    0    0    0    0    0    0    0    0    0     0     0     1     1     1     0
[11,]    0    0    1    0    1    0    0    0    0     0     0     0     0     0     1
[12,]    0    0    0    1    0    0    0    0    0     1     0     0     0     0     0
[13,]    0    0    1    0    1    0    0    0    1     0     0     0     0     0     1
[14,]    0    0    0    0    0    0    0    0    1     0     0     0     0     0     1
[15,]    1    1    1    1    1    0    0    0    1     1     0     0     0     0     1

output.mat <- structure(c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 
                          0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 
                          0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 
                          1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 
                          1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L, 
                          0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
                          0L, 0L, 0L, 0L), .Dim = c(15L, 15L), .Dimnames = list(NULL, NULL))

output.mat
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
 [1,]    1    1    0    0    0    0    0    0    0     0     0     0     1     0     0
 [2,]    1    1    0    0    0    0    0    0    0     0     0     0     0     1     0
 [3,]    0    0    1    0    0    0    0    0    0     0     0     0     1     0     1
 [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0     1     0
 [5,]    0    0    0    0    0    0    0    0    0     0     0     1     0     0     0
 [6,]    1    0    0    0    0    0    0    0    0     0     1     0     1     1     0
 [7,]    1    1    0    0    0    0    0    0    0     0     0     1     0     0     0
 [8,]    1    1    0    0    0    0    0    0    0     0     0     0     0     0     0
 [9,]    1    0    0    0    0    0    0    0    0     0     0     1     1     1     0
[10,]    0    0    0    0    0    0    0    0    0     0     0     1     1     1     0
[11,]    0    0    1    0    1    0    0    0    0     0     0     0     0     0     1
[12,]    0    0    0    1    0    0    0    0    0     1     0     0     0     0     0
[13,]    0    0    1    0    1    0    0    0    1     0     0     0     0     0     0
[14,]    0    0    0    0    0    0    0    0    1     0     0     0     0     0     0
[15,]    1    1    1    1    1    0    0    0    1     1     0     0     0     0     0

^{Created on 2021-05-27 by the reprex package (v2.0.0)}^{由代表 package (v2.0.0) 于 2021 年 5 月 27 日创建}

Answer 1

Using terra functions:使用terra函数：

Convert matrix to raster ( rast ).将矩阵转换为栅格（ rast ）。 Identify patches of 1s, surrounded by zeros ( zeroAsNA = TRUE ).识别由零包围的 1 patches ( zeroAsNA = TRUE )。 Consider also diagonal neighbors when defining contiguity ( directions = 8 ).定义邻接时还要考虑对角邻居（ directions = 8 ）。 Count number of cells in each patch ( freq ).计算每个补丁 ( freq ) 中的单元格数。 Check which patches have a count of < 5 .检查which补丁的count < 5 。 At these indices, set cells to NA .在这些索引处，将单元格设置为NA 。 Coerce raster to matrix and check which values are NA .将栅格强制转换为矩阵并检查哪些值为NA 。 At these indices, set original matrix values to 0.在这些索引处，将原始矩阵值设置为 0。

library(terra)

m = input.mat
p = patches(rast(input.mat), directions = 8, zeroAsNA = TRUE)
p[p %in% which(freq(p)[ , "count"] < 5)] = NA
m[is.na(as.matrix(p, wide = TRUE))] = 0

all.equal(m, output.mat)
# [1] TRUE

Patches in original input.mat ( plot(p) ):原始 input.mat 中的补丁（ plot(p) ）：

After removal of patches with < 5 cells:去除小于 5 个细胞的补丁后：

Related posts: Combining polygons and calculating their area (ie number of cells) in R ;相关文章：在 R 中组合多边形并计算它们的面积（即单元格数）； Obtaining connected components in R 获取R中的连通分量

Answer 2

With data.table non equi-join to find neighbouring points and igraph :使用data.table非等值连接来查找相邻点和igraph ：

library(igraph)
library(data.table)

# index of pixels fulfilling criteria
idx <- which(input.mat==1)

# Coordinates of pixels
coord <- data.table(arrayInd(idx,dim(input.mat)))
setnames(coord,c("x","y"))
coord[,c('xmin','xmax','ymin','ymax'):=.(x-1,x+1,y-1,y+1)]

# Find neighbours indices
neighbours <- coord[coord,.(x.x,x.y,i.x,i.y),on=.(x>=xmin,x<=xmax,y>=ymin,y<=ymax)][!(i.x==x.x&i.y==x.y)][
  ,.(start = nrow(input.mat)*(x.y-1)+x.x,
     end   = nrow(input.mat)*(i.y-1)+i.x)]

g <- graph_from_data_frame(neighbours)
g
#> IGRAPH 503ba64 DN-- 53 120 -- 
#> + attr: name (v/c)
#> + edges from 503ba64 (vertex names):
#>  [1] 2  ->1   16 ->1   17 ->1   1  ->2   16 ->2   17 ->2   7  ->6   22 ->6  
#>  [9] 6  ->7   8  ->7   22 ->7   23 ->7   7  ->8   9  ->8   22 ->8   23 ->8  
#> [17] 8  ->9   23 ->9   30 ->15  1  ->16  2  ->16  17 ->16  1  ->17  2  ->17 
#> [25] 16 ->17  33 ->17  6  ->22  7  ->22  8  ->22  23 ->22  7  ->23  8  ->23 
#> [33] 9  ->23  22 ->23  15 ->30  45 ->30  17 ->33  49 ->33  57 ->41  57 ->43 
#> [41] 30 ->45  60 ->45  33 ->49  41 ->57  43 ->57  71 ->57  73 ->57  45 ->60 
#> [49] 75 ->60  77 ->62  57 ->71  57 ->73  60 ->75  62 ->77  92 ->77  77 ->92 
#> [57] 134->133 147->133 133->134 135->134 150->134 134->135 150->135 138->137
#> + ... omitted several edges

# Find clusters
clust <- clusters(g)

# Minimum size
kept <- clust$membership[clust$membership %in% which(clust$csize >= 5)]

idx_kept <- as.numeric(names(kept)) 

M <- input.mat*0
M[idx_kept]<-1
M
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#>  [1,]    1    1    0    0    0    0    0    0    0     0     0     0     1
#>  [2,]    1    1    0    0    0    0    0    0    0     0     0     0     0
#>  [3,]    0    0    1    0    0    0    0    0    0     0     0     0     1
#>  [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0
#>  [5,]    0    0    0    0    0    0    0    0    0     0     0     1     0
#>  [6,]    1    0    0    0    0    0    0    0    0     0     1     0     1
#>  [7,]    1    1    0    0    0    0    0    0    0     0     0     1     0
#>  [8,]    1    1    0    0    0    0    0    0    0     0     0     0     0
#>  [9,]    1    0    0    0    0    0    0    0    0     0     0     1     1
#> [10,]    0    0    0    0    0    0    0    0    0     0     0     1     1
#> [11,]    0    0    1    0    1    0    0    0    0     0     0     0     0
#> [12,]    0    0    0    1    0    0    0    0    0     1     0     0     0
#> [13,]    0    0    1    0    1    0    0    0    1     0     0     0     0
#> [14,]    0    0    0    0    0    0    0    0    1     0     0     0     0
#> [15,]    1    1    1    1    1    0    0    0    1     1     0     0     0
#>       [,14] [,15]
#>  [1,]     0     0
#>  [2,]     1     0
#>  [3,]     0     1
#>  [4,]     1     0
#>  [5,]     0     0
#>  [6,]     1     0
#>  [7,]     0     0
#>  [8,]     0     0
#>  [9,]     1     0
#> [10,]     1     0
#> [11,]     0     1
#> [12,]     0     0
#> [13,]     0     0
#> [14,]     0     0
#> [15,]     0     0

all.equal(output.mat,M)
#[1] TRUE

Answer 3

Here is a base R code for 2D points clustering这是用于二维点聚类的基本 R 代码

# compute distance from point `x` to point set `S`
fdist <- function(x, S) {
  if (length(S) == 0) {
    return(0)
  }
  v <- x - S
  pmax(abs(Re(v)), abs(Im(v)))
}

# assign groups based on distance
fgrp <- function(x, clst) {
  for (k in seq_along(clst)) {
    if (any(fdist(x, clst[[k]]) < 2)) {
      clst[[k]] <- c(clst[[k]], x)
      return(clst)
    }
  }
}

# use complex number represent 2D points
p <- c(which(input.mat == 1, arr.ind = TRUE) %*% c(1, 1i))
# initialize cluster list
clst <- list()
while (length(p) > 0) {
  idxrm <- c()
  for (k in seq_along(p)) {
    clst_new <- fgrp(p[k], clst)
    if (sum(lengths(clst_new)) > sum(lengths(clst))) {
      idxrm <- c(idxrm, k)
      clst <- clst_new
    }
  }
  if (length(idxrm) == 0) {
    clst <- c(clst, list(p[1]))
  } else {
    p <- p[-idxrm]
  }
}

# keep points that follows the contiguous pattern 
N <- 5
Z <- do.call(
  c,
  Filter(
    function(x) length(x) >= N,
    Map(
      unique,
      clst
    )
  )
)

# produce output matrix
output.mat <- input.mat * 0
output.mat[cbind(Re(Z), Im(Z))] <- 1

and you will obtain你会得到

> output.mat
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
 [1,]    1    1    0    0    0    0    0    0    0     0     0     0     1
 [2,]    1    1    0    0    0    0    0    0    0     0     0     0     0
 [3,]    0    0    1    0    0    0    0    0    0     0     0     0     1
 [4,]    0    0    0    1    0    0    0    0    0     0     0     0     0
 [5,]    0    0    0    0    0    0    0    0    0     0     0     1     0
 [6,]    1    0    0    0    0    0    0    0    0     0     1     0     1
 [7,]    1    1    0    0    0    0    0    0    0     0     0     1     0
 [8,]    1    1    0    0    0    0    0    0    0     0     0     0     0
 [9,]    1    0    0    0    0    0    0    0    0     0     0     1     1
[10,]    0    0    0    0    0    0    0    0    0     0     0     1     1
[11,]    0    0    1    0    1    0    0    0    0     0     0     0     0
[12,]    0    0    0    1    0    0    0    0    0     1     0     0     0
[13,]    0    0    1    0    1    0    0    0    1     0     0     0     0
[14,]    0    0    0    0    0    0    0    0    1     0     0     0     0
[15,]    1    1    1    1    1    0    0    0    1     1     0     0     0
      [,14] [,15]
 [1,]     0     0
 [2,]     1     0
 [3,]     0     1
 [4,]     1     0
 [5,]     0     0
 [6,]     1     0
 [7,]     0     0
 [8,]     0     0
 [9,]     1     0
[10,]     1     0
[11,]     0     1
[12,]     0     0
[13,]     0     0
[14,]     0     0
[15,]     0     0

Ideas想法

Find the positions of 1 s, ie, row-column indices求1的位置，即行列索引
For each point position, we check if it falls within any existing cluster.对于每个点 position，我们检查它是否属于任何现有集群。 If yes, the point is assigned to that cluster.如果是，则将该点分配给该集群。 Otherwise, a new cluster is created with this point否则，使用此点创建一个新集群
The process is terminated when all points are checked.当检查所有点时终止该过程。

在网格中识别至少 N 个符合特定条件的连续单元格

问题描述

3 个解决方案

解决方案1
7 2022-06-24 22:16:36

解决方案2
3 2022-06-24 13:43:33

解决方案3
2 已采纳 2021-05-28 12:08:46

Ideas想法

在网格中识别至少 N 个符合特定条件的连续单元格

问题描述

3 个解决方案

解决方案1 7 2022-06-24 22:16:36

解决方案2 3 2022-06-24 13:43:33

解决方案3 2 已采纳 2021-05-28 12:08:46

Ideas想法

解决方案1
7 2022-06-24 22:16:36

解决方案2
3 2022-06-24 13:43:33

解决方案3
2 已采纳 2021-05-28 12:08:46