如何在单独的numpy数组中使用相同的值对numpy数组的元素进行分组

Question

As usual intro, I am a tyro in python. 像往常一样介绍，我是python中的tyro。 However, I got quite a big project to code. 但是，我有一个很大的代码项目。 It is a surface flow model with Cell Automata. 它是Cell Automata的表面流动模型。 Anyway, I also want to include building roofs in my model. 无论如何，我还想在我的模型中包括建筑屋顶。 Imagine you have an ascii file indicating buildings with 1s, while the rest is 0. There are just those two states. 想象一下，你有一个ascii文件指示1s的建筑物，而其余的是0.只有这两种状态。 Now, I want to find all adjacent cells indicating the same building and store them (or rather the information of y,x and one more (maybe elevation),so 3 columns) in an individual building arrays. 现在，我想找到所有相邻的单元格，指示相同的建筑物并将它们（或者更确切地说是y，x和另外一个（可能是高程）的信息，因此在单个建筑物阵列中存储）。 Keep in mind that buildings can have all possible forms though diagonally connected cells doesn't belong to the same building. 请记住，尽管对角连接的单元不属于同一建筑物，但建筑物可以具有所有可能的形式。 So only northern, southern, western and eastern cells can belong to the same building. 因此，只有北部，南部，西部和东部的细胞可以属于同一建筑物。

I did my homework and googled it but so far I couldn't find a satisfying answer. 我做了我的家庭作业并用Google搜索，但到目前为止我找不到令人满意的答案。

example: initial land-cover array: 示例：初始土地覆盖数组：

([0,0,0,0,0,0,0]
 [0,0,1,0,0,0,0]
 [0,1,1,1,0,1,1]
 [0,1,0,1,0,0,1]
 [0,0,0,0,0,0,0])

output(I need to now the coordinates of the cells in my initial array): 输出（我现在需要初始数组中单元格的坐标）：

 building_1=([1,2],[2,1],[2,2],[2,3],[3,1],[3,3])
 building_2=([2,5],[2,6],[3,6])

Any help is greatly appreciated! 任何帮助是极大的赞赏！

Answer 1

You can use the label function from scipy.ndimage to identify the distinct buildings. 您可以使用scipy.ndimage的label功能来识别不同的建筑物。

Here's your example array, containing two buildings: 这是你的示例数组，包含两个建筑物：

In [57]: a
Out[57]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 1, 1, 1, 0, 1, 1],
       [0, 1, 0, 1, 0, 0, 1],
       [0, 0, 0, 0, 0, 0, 0]])

Import label . 导入label 。

In [58]: from scipy.ndimage import label

Apply label to a . 将label到a 。 It returns two values: the array of labeled positions, and the number of distinct objects (buildings, in this case) found. 它返回两个值：标记位置数组，以及找到的不同对象（在本例中为建筑物）的数量。

In [59]: lbl, nlbls = label(a)

In [60]: lbl
Out[60]: 
array([[0, 0, 0, 0, 0, 0, 0],
       [0, 0, 1, 0, 0, 0, 0],
       [0, 1, 1, 1, 0, 2, 2],
       [0, 1, 0, 1, 0, 0, 2],
       [0, 0, 0, 0, 0, 0, 0]], dtype=int32)

In [61]: nlbls
Out[61]: 2

To get the coordinates of a building, np.where can be used. 要获得建筑物的坐标，可以使用np.where 。 For example, 例如，

In [64]: np.where(lbl == 2)
Out[64]: (array([2, 2, 3]), array([5, 6, 6]))

It returns a tuple of arrays; 它返回一个数组元组; the k th array holds the coordinates of the k th dimension. 第k个数组保存第k个维的坐标。 You can use, for example, np.column_stack to combine these into an array: 例如，您可以使用np.column_stack将这些组合成一个数组：

 In [65]: np.column_stack(np.where(lbl == 2))
 Out[65]: 
 array([[2, 5],
        [2, 6],
        [3, 6]])

You might want a list of all the coordinate arrays. 您可能需要所有坐标数组的列表。 Here's one way to create such a list. 这是创建这样一个列表的一种方法。

For convenience, first create a list of labels: 为方便起见，首先要创建一个标签列表：

In [66]: labels = range(1, nlbls+1)

In [67]: labels
Out[67]: [1, 2]

Use a list comprehension to create the list of coordinate arrays. 使用列表推导来创建坐标数组列表。

In [68]: coords = [np.column_stack(where(lbl == k)) for k in labels]

In [69]: coords
Out[69]: 
[array([[1, 2],
       [2, 1],
       [2, 2],
       [2, 3],
       [3, 1],
       [3, 3]]),
 array([[2, 5],
       [2, 6],
       [3, 6]])]

Now your building data is in labels and coords . 现在您的建筑数据是labels和coords 。 For example, the first building was labeled labels[0] , and its coordinates are in coords[0] : 例如，第一个建筑物标记为labels[0] ，其坐标位于coords[0] ：

In [70]: labels[0]
Out[70]: 1

In [71]: coords[0]
Out[71]: 
array([[1, 2],
       [2, 1],
       [2, 2],
       [2, 3],
       [3, 1],
       [3, 3]])

Answer 2

Thank you for the great answers! 谢谢你的答案！ Here is a little correction. 这是一个小小的修正。 If you see the landcover array, I actually don't have 0 as background information but -9999 (0 is too precious for GIS people). 如果你看到landcover数组，我实际上没有0作为背景信息但是-9999（0对于GIS人来说太宝贵了）。 I forgot to mention that. 我忘了提到那个。 But thanks to machine yearning's hint, I made a work-around by assigning all -9999 with 0 through landcover = np.where(landcover > -9999, landcover, 0). 但是由于机器向往的暗示，我通过landcover = np.where（landcover> -9999，landcover，0）将所有-9999分配为0来进行解决方案。 After that I can use label. 之后我可以使用标签。 The actual aim was to find the lowest cell and to assign it as outlet. 实际目的是找到最低的细胞并将其指定为出口。 If somebody has a more efficient way, please let me know! 如果有人有更有效的方式，请告诉我！

import numpy as np
from scipy.ndimage import label

Original data set has -9999 as background information and 1 as building cells. 原始数据集具有-9999作为背景信息，1作为构建单元格。

landcover = np.array([[-9999,-9999,-9999,-9999,-9999,-9999,1], 
                       [-9999,-9999,1,-9999,-9999,-9999,-9999],
                       [-9999,1,1,1,-9999,1,1], 
                       [-9999,1,-9999,1,-9999,-9999,1], 
                       [-9999,-9999,-9999,-9999,-9999,-9999,-9999]],dtype=int)

Here is a random digital elevation map. 这是一个随机的数字高程图。

DEM = np.array([[7,4,3,2,4,5,4], 
               [4,5,5,3,5,6,7],
               [2,6,4,7,4,4,4],
               [3,7,8,8,10,9,7],
               [2,5,7,7,9,10,8]],dtype=float)

I changed all -9999 entries to 0 in order to use label @thanks to machine yearning 我将所有-9999条目更改为0，以便使用标签@thanks来加工渴望

 landcover = np.where(landcover > -9999, landcover, 0)

Then I labeled distinct buildings and counting those distinctions @Warren Weckesser, the rest pretty much yours. 然后我标记了不同的建筑物并计算了这些区别@Warren Weckesser，其余几乎是你的。 thanks! 谢谢！

 lbl, nlbls = label(landcover)
 bldg_number = range(1, nlbls+1)
 bldg_coord = [np.column_stack(where(lbl == k)) for k in bldg_no]
 outlets=np.zeros([nlbls,3])

I am iterating over the bldg_coord list in order to determine the lowest cells which will be assigned as outlet 我正在迭代bldg_coord列表，以确定将被指定为出口的最低单元格

 for i in range(0, nlbls):
     building=np.zeros([bldg_coord[i].shape[0],3])
     for j in range(0,bldg_coord[i].shape[0]):
         building[j][0]=bldg_coord[i][j][0]
         building[j][1]=bldg_coord[i][j][1]
         building[j][2]=DEM[bldg_coord[i][j][0],bldg_coord[i][j][1]]

I sort the building array in ascending order according to the DEM information of each building cell in order to find the lowest lying building cells. 我根据每个建筑单元的DEM信息按升序对建筑物阵列进行排序，以便找到最低层的建筑单元。

  building=building[building[:,2].argsort()]

The lowest building cell will be used as roof outlet for rainwater 最低的建筑单元将用作雨水的屋顶出口

  outlets[i][0]=building[0][0]
  outlets[i][1]=building[0][1]
  outlets[i][2]=bldg_coord[i].shape[0]

Here is the output. 这是输出。 The first two columns are indices in den landcover array and the last is the number of adjacent building cells. 前两列是den landcover数组中的索引，最后一列是相邻建筑单元的数量。

>>> outlets
array([[ 0.,  6.,  1.],
       [ 2.,  2.,  6.],
       [ 2.,  5.,  3.]])

Answer 3

It looks like this function does exactly what you're looking for (from the numpy documentation ): 看起来这个函数完全符合您的要求（来自numpy文档）：

numpy.argwhere(a): numpy.argwhere（A）：

Find the indices of array elements that are non-zero, grouped by element. 查找按元素分组的非零的数组元素的索引。

>>> x = np.arange(6).reshape(2,3)
>>> x
array([[0, 1, 2],
       [3, 4, 5]])
>>> np.argwhere(x>1)
array([[0, 2],
       [1, 0],
       [1, 1],
       [1, 2]])

Alternatively it seems like your use case requires using the returned coordinates to index arrays. 或者，似乎您的用例需要使用返回的坐标来索引数组。

The output of argwhere is not suitable for indexing arrays. argwhere的输出不适合索引数组。 For this purpose use where(a) instead. 为此目的，请使用（a）代替。

You might want to try numpy.where instead. 你可能想尝试numpy.where 。

如何在单独的numpy数组中使用相同的值对numpy数组的元素进行分组

问题描述

3 个解决方案

解决方案1
5 已采纳 2014-04-04 11:13:42

解决方案2
1 2014-04-08 03:58:03

解决方案3
0 2014-04-04 07:34:50

如何在单独的numpy数组中使用相同的值对numpy数组的元素进行分组

问题描述

3 个解决方案

解决方案1 5 已采纳 2014-04-04 11:13:42

解决方案2 1 2014-04-08 03:58:03

解决方案3 0 2014-04-04 07:34:50

解决方案1
5 已采纳 2014-04-04 11:13:42

解决方案2
1 2014-04-08 03:58:03

解决方案3
0 2014-04-04 07:34:50