简体   繁体   English

在Python中按位置有效地对坐标点列表进行分组

[英]Efficiently grouping a list of coordinates points by location in Python

Given a list of X,Y coordinate points on a 2D grid, what is the most efficient algorithm to create a list of groups of adjacent coordinate points?给定 2D 网格上的 X、Y 坐标点列表,创建相邻坐标点组列表的最有效算法是什么?

For example, given a list of points making up two non-adjacent squares (3x3) on a grid (15x15), the result of this algorithm would be two groups of points corresponding to the two squares.例如,给定在网格 (15x15) 上组成两个不相邻方块 (3x3) 的点列表,该算法的结果将是对应于两个方块的两组点。

I suppose you could do a flood fill algorithm, but this seems overkill and not very efficient for a large 2D array of say 1024 size.我想你可以做一个洪水填充算法,但这对于一个 1024 大小的大型二维数组来说似乎有点矫枉过正,而且效率不高。

Fundamentally, this is an image processing operation.从根本上说,这是一个图像处理操作。 If you use an image processing library like scikit-image (aka skimage ), it will be easy.如果您使用像scikit-image (又名skimage )这样的图像处理库,那会很容易。 Dealing with really huge data will eventually get slow, but 1024x1024 is nothing.处理真正庞大的数据最终会变慢,但 1024x1024 算不了什么。

In [1]: import numpy as np
In [2]: import skimage.morphology
In [3]: x = [0,1,2,0,1,2,0,1,2,-3,-2,-1,-3,-2,-1,-3,-2,-1]
In [4]: y = [0,0,0,1,1,1,2,2,2,-3,-3,-3,-2,-2,-2,-1,-1,-1]
In [5]: dense = np.zeros((9,9), dtype=bool)
In [6]: dense[y,x] = True

In [7]: print(dense)
[[ True  True  True False False False False False False]
 [ True  True  True False False False False False False]
 [ True  True  True False False False False False False]
 [False False False False False False False False False]
 [False False False False False False False False False]
 [False False False False False False False False False]
 [False False False False False False  True  True  True]
 [False False False False False False  True  True  True]
 [False False False False False False  True  True  True]]

In [8]: labeled = skimage.morphology.label(dense)
In [9]: print(labeled)
[[1 1 1 0 0 0 0 0 0]
 [1 1 1 0 0 0 0 0 0]
 [1 1 1 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0 0 0]
 [0 0 0 0 0 0 2 2 2]
 [0 0 0 0 0 0 2 2 2]
 [0 0 0 0 0 0 2 2 2]]

In [10]: coords_yx = { i: (labeled == i).nonzero() for i in range(1,labeled.max()+1) }
In [11]: coords_yx
Out[11]:
{1: (array([0, 0, 0, 1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2, 0, 1, 2])),
 2: (array([6, 6, 6, 7, 7, 7, 8, 8, 8]), array([6, 7, 8, 6, 7, 8, 6, 7, 8]))}

You can hash all coordinate points (eg using dictionary structure in python) and then for each coordinate point, hash the adjacent neighbors of the point to find pairs of points that are adjacent and "merge" them.您可以散列所有坐标点(例如在python 中使用字典结构),然后对于每个坐标点,散列该点的相邻邻居以找到相邻的点对并“合并”它们。 Also, for each point you can maintain a pointer to the connected component that that point belongs to (using the dictionary structure), and for each connected component you maintain a list of points that belong to the component.此外,对于每个点,您可以维护一个指向该点所属的连接组件的指针(使用字典结构),并且对于每个连接的组件,您可以维护一个属于该组件的点列表。

Then, when you hash a neighbor of a point and find a match, you merge the two connected component sets that the points belong to and update the group pointers for all new points in the union set.然后,当您散列一个点的邻居并找到匹配项时,您合并这些点所属的两个连通组件集,并更新联合集中所有新点的组指针。 You can show that you only need to hash all the neighbors of all points just once and this will find all connected components, and furthermore, if you update the pointers for the smaller of the two connected components sets when two connected component sets are merged, then the run-time will be linear in the number of points.您可以证明您只需要对所有点的所有邻居进行一次散列,这将找到所有连接的组件,此外,如果在合并两个连接的组件集时更新两个连接的组件集中较小的指针,那么运行时间将与点数成线性关系。

It's unclear what you mean by "groups of adjacent" coordinate points.目前还不清楚“相邻组”坐标点是什么意思。 Your example of two non-adjacent 3x3 squares suggests you are looking for what is called connected components labeling .您的两个不相邻 3x3 正方形的示例表明您正在寻找所谓的连接组件标签

There are many implementations to extract connected components.有许多实现来提取连接的组件。 Below are a few for guidance.以下是一些指导。

  1. cclabel标签
  2. OpenCV OpenCV
  3. bwconncomp bwconncomp

However, I've implemented this kind of blob detector and they are not that hard to write up if you are looking for a learning experience.但是,我已经实现了这种 blob 检测器,如果您正在寻找学习经验,那么编写它们并不难。 If not, then I would go with the most mature library like OpenCV and use their Python API if that's all you need.如果没有,那么我会选择像 OpenCV 这样最成熟的库,如果你只需要这些,我会使用他们的 Python API。

Also, you mentioned "efficiency".另外,您提到了“效率”。 Note that there are single-pass and double-pass version of these algorithms.请注意,这些算法有单程和双程版本。 Single-pass, as the name suggests, is generally more efficient as it only requires a single pass through our data.顾名思义,单次传递通常更有效,因为它只需要一次传递我们的数据。 This might be needed if your grids are very large.如果您的网格非常大,则可能需要这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM