[英]Identify at least N contiguous cells that match a certain criteria, in a grid
I have an X by Y grid with cells containing 1 if a certain criteria is met or 0 if it is not.我有一个 X by Y 网格,如果满足某个条件,则单元格包含 1,否则为 0。 Now I want to identify features in the grid where there are at least N contiguous cells containing a 1. Contiguous cells can be adjacent side by side, or adjacent diagonally.
现在我想识别网格中至少有 N 个包含 1 的连续单元格的特征。连续单元格可以并排相邻,也可以对角相邻。 I made a picture to illustrate the problem (see link), with N = 5. For clarity I omitted marking the 0s, and they are in the unmarked cells.
我制作了一张图片来说明问题(见链接),N = 5。为清楚起见,我省略了标记 0,它们位于未标记的单元格中。 Red 1s belong to features I want to identify, and black 1s do not.
红色 1 属于我要识别的特征,黑色 1 不属于。 The desired result would be as shown in the picture, but with all the black 1s changed to 0s.
所需的结果将如图所示,但所有黑色的 1 都变为 0。 I use R, so solutions using that language would be thoroughly appreciated, but I'll happily settle for others.
我使用 R,因此使用该语言的解决方案将不胜感激,但我很乐意接受其他人。 I couldn't find anything in the R libraries (such as rgeos) specifically, but maybe I'm missing something.
我在 R 库(例如 rgeos)中找不到任何东西,但也许我遗漏了一些东西。 Any help appreciated, thanks!
任何帮助表示赞赏,谢谢!
Here is a small reproducible example created这是一个创建的可重复的小示例
input.mat <- structure(c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 1L, 1L), .Dim = c(15L, 15L), .Dimnames = list(NULL, NULL))
input.mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
[2,] 1 1 0 0 1 1 1 0 0 1 0 0 0 1 0
[3,] 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1
[4,] 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
[5,] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[6,] 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0
[7,] 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0
[8,] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
[9,] 1 0 0 0 0 1 0 1 0 0 0 1 1 1 0
[10,] 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
[11,] 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1
[12,] 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
[13,] 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1
[14,] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
[15,] 1 1 1 1 1 0 0 0 1 1 0 0 0 0 1
output.mat <- structure(c(1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L), .Dim = c(15L, 15L), .Dimnames = list(NULL, NULL))
output.mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
[2,] 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0
[3,] 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1
[4,] 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0
[5,] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
[6,] 1 0 0 0 0 0 0 0 0 0 1 0 1 1 0
[7,] 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0
[8,] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
[9,] 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0
[10,] 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
[11,] 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1
[12,] 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
[13,] 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0
[14,] 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
[15,] 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0
Created on 2021-05-27 by the reprex package (v2.0.0)由代表 package (v2.0.0) 于 2021 年 5 月 27 日创建
Using terra
functions:使用
terra
函数:
Convert matrix to raster ( rast
).将矩阵转换为栅格(
rast
)。 Identify patches
of 1s, surrounded by zeros ( zeroAsNA = TRUE
).识别由零包围的 1
patches
( zeroAsNA = TRUE
)。 Consider also diagonal neighbors when defining contiguity ( directions = 8
).定义邻接时还要考虑对角邻居(
directions = 8
)。 Count number of cells in each patch ( freq
).计算每个补丁 (
freq
) 中的单元格数。 Check which
patches have a count
of < 5
.检查
which
补丁的count
< 5
。 At these indices, set cells to NA
.在这些索引处,将单元格设置为
NA
。 Coerce raster to matrix and check which values are NA
.将栅格强制转换为矩阵并检查哪些值为
NA
。 At these indices, set original matrix values to 0.在这些索引处,将原始矩阵值设置为 0。
library(terra)
m = input.mat
p = patches(rast(input.mat), directions = 8, zeroAsNA = TRUE)
p[p %in% which(freq(p)[ , "count"] < 5)] = NA
m[is.na(as.matrix(p, wide = TRUE))] = 0
all.equal(m, output.mat)
# [1] TRUE
Patches in original input.mat ( plot(p)
):原始 input.mat 中的补丁(
plot(p)
):
After removal of patches with < 5 cells:去除小于 5 个细胞的补丁后:
Related posts: Combining polygons and calculating their area (ie number of cells) in R ;相关文章: 在 R 中组合多边形并计算它们的面积(即单元格数) ; Obtaining connected components in R
获取R中的连通分量
With data.table
non equi-join to find neighbouring points and igraph
:使用
data.table
非等值连接来查找相邻点和igraph
:
library(igraph)
library(data.table)
# index of pixels fulfilling criteria
idx <- which(input.mat==1)
# Coordinates of pixels
coord <- data.table(arrayInd(idx,dim(input.mat)))
setnames(coord,c("x","y"))
coord[,c('xmin','xmax','ymin','ymax'):=.(x-1,x+1,y-1,y+1)]
# Find neighbours indices
neighbours <- coord[coord,.(x.x,x.y,i.x,i.y),on=.(x>=xmin,x<=xmax,y>=ymin,y<=ymax)][!(i.x==x.x&i.y==x.y)][
,.(start = nrow(input.mat)*(x.y-1)+x.x,
end = nrow(input.mat)*(i.y-1)+i.x)]
g <- graph_from_data_frame(neighbours)
g
#> IGRAPH 503ba64 DN-- 53 120 --
#> + attr: name (v/c)
#> + edges from 503ba64 (vertex names):
#> [1] 2 ->1 16 ->1 17 ->1 1 ->2 16 ->2 17 ->2 7 ->6 22 ->6
#> [9] 6 ->7 8 ->7 22 ->7 23 ->7 7 ->8 9 ->8 22 ->8 23 ->8
#> [17] 8 ->9 23 ->9 30 ->15 1 ->16 2 ->16 17 ->16 1 ->17 2 ->17
#> [25] 16 ->17 33 ->17 6 ->22 7 ->22 8 ->22 23 ->22 7 ->23 8 ->23
#> [33] 9 ->23 22 ->23 15 ->30 45 ->30 17 ->33 49 ->33 57 ->41 57 ->43
#> [41] 30 ->45 60 ->45 33 ->49 41 ->57 43 ->57 71 ->57 73 ->57 45 ->60
#> [49] 75 ->60 77 ->62 57 ->71 57 ->73 60 ->75 62 ->77 92 ->77 77 ->92
#> [57] 134->133 147->133 133->134 135->134 150->134 134->135 150->135 138->137
#> + ... omitted several edges
# Find clusters
clust <- clusters(g)
# Minimum size
kept <- clust$membership[clust$membership %in% which(clust$csize >= 5)]
idx_kept <- as.numeric(names(kept))
M <- input.mat*0
M[idx_kept]<-1
M
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
#> [1,] 1 1 0 0 0 0 0 0 0 0 0 0 1
#> [2,] 1 1 0 0 0 0 0 0 0 0 0 0 0
#> [3,] 0 0 1 0 0 0 0 0 0 0 0 0 1
#> [4,] 0 0 0 1 0 0 0 0 0 0 0 0 0
#> [5,] 0 0 0 0 0 0 0 0 0 0 0 1 0
#> [6,] 1 0 0 0 0 0 0 0 0 0 1 0 1
#> [7,] 1 1 0 0 0 0 0 0 0 0 0 1 0
#> [8,] 1 1 0 0 0 0 0 0 0 0 0 0 0
#> [9,] 1 0 0 0 0 0 0 0 0 0 0 1 1
#> [10,] 0 0 0 0 0 0 0 0 0 0 0 1 1
#> [11,] 0 0 1 0 1 0 0 0 0 0 0 0 0
#> [12,] 0 0 0 1 0 0 0 0 0 1 0 0 0
#> [13,] 0 0 1 0 1 0 0 0 1 0 0 0 0
#> [14,] 0 0 0 0 0 0 0 0 1 0 0 0 0
#> [15,] 1 1 1 1 1 0 0 0 1 1 0 0 0
#> [,14] [,15]
#> [1,] 0 0
#> [2,] 1 0
#> [3,] 0 1
#> [4,] 1 0
#> [5,] 0 0
#> [6,] 1 0
#> [7,] 0 0
#> [8,] 0 0
#> [9,] 1 0
#> [10,] 1 0
#> [11,] 0 1
#> [12,] 0 0
#> [13,] 0 0
#> [14,] 0 0
#> [15,] 0 0
all.equal(output.mat,M)
#[1] TRUE
Here is a base R code for 2D points clustering这是用于二维点聚类的基本 R 代码
# compute distance from point `x` to point set `S`
fdist <- function(x, S) {
if (length(S) == 0) {
return(0)
}
v <- x - S
pmax(abs(Re(v)), abs(Im(v)))
}
# assign groups based on distance
fgrp <- function(x, clst) {
for (k in seq_along(clst)) {
if (any(fdist(x, clst[[k]]) < 2)) {
clst[[k]] <- c(clst[[k]], x)
return(clst)
}
}
}
# use complex number represent 2D points
p <- c(which(input.mat == 1, arr.ind = TRUE) %*% c(1, 1i))
# initialize cluster list
clst <- list()
while (length(p) > 0) {
idxrm <- c()
for (k in seq_along(p)) {
clst_new <- fgrp(p[k], clst)
if (sum(lengths(clst_new)) > sum(lengths(clst))) {
idxrm <- c(idxrm, k)
clst <- clst_new
}
}
if (length(idxrm) == 0) {
clst <- c(clst, list(p[1]))
} else {
p <- p[-idxrm]
}
}
# keep points that follows the contiguous pattern
N <- 5
Z <- do.call(
c,
Filter(
function(x) length(x) >= N,
Map(
unique,
clst
)
)
)
# produce output matrix
output.mat <- input.mat * 0
output.mat[cbind(Re(Z), Im(Z))] <- 1
and you will obtain你会得到
> output.mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] 1 1 0 0 0 0 0 0 0 0 0 0 1
[2,] 1 1 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0 0 0 0 1
[4,] 0 0 0 1 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 0 0 1 0
[6,] 1 0 0 0 0 0 0 0 0 0 1 0 1
[7,] 1 1 0 0 0 0 0 0 0 0 0 1 0
[8,] 1 1 0 0 0 0 0 0 0 0 0 0 0
[9,] 1 0 0 0 0 0 0 0 0 0 0 1 1
[10,] 0 0 0 0 0 0 0 0 0 0 0 1 1
[11,] 0 0 1 0 1 0 0 0 0 0 0 0 0
[12,] 0 0 0 1 0 0 0 0 0 1 0 0 0
[13,] 0 0 1 0 1 0 0 0 1 0 0 0 0
[14,] 0 0 0 0 0 0 0 0 1 0 0 0 0
[15,] 1 1 1 1 1 0 0 0 1 1 0 0 0
[,14] [,15]
[1,] 0 0
[2,] 1 0
[3,] 0 1
[4,] 1 0
[5,] 0 0
[6,] 1 0
[7,] 0 0
[8,] 0 0
[9,] 1 0
[10,] 1 0
[11,] 0 1
[12,] 0 0
[13,] 0 0
[14,] 0 0
[15,] 0 0
1
s, ie, row-column indices1
的位置,即行列索引
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.