简体   繁体   English

如何有效地计算3d numpy数组中的相邻元素

[英]how to count adjacent elements in a 3d numpy array efficiently

I have a 3d numpy array fill with integers from 1 to 7. 我有一个3d numpy数组,其中填充了1到7之间的整数。
I want to count the number of unique elements in neighbour cells of each cell. 我想计算每个单元格的相邻单元格中唯一元素的数量。 For exmaple, in a 2d array: 例如,在二维数组中:

a=[[1,1,1,7,4],
   [1,1,1,3,2],
   [1,1,1,2,2],
   [1,3,1,4,2],
   [1,1,1,4,2]]  

would yield a result of: 将产生以下结果:

[[1,1,2,3,2],
 [1,1,2,3,3],
 [1,2,2,4,1],
 [2,1,3,3,2],
 [1,2,2,3,2]]  

I am currently going through every cell in the array and checking its neighbour one-by-one. 我目前正在遍历数组中的每个单元,并逐一检查其邻居。

temp = np.zeros(6)
if (x>0):
    temp[0] = model[x-1,y,z]
if (x<x_len-1):
    temp[1] = model[x+1,y,z]
if (y>0):
    temp[2] = model[x,y-1,z]
if (y<y_len-1):
    temp[3] = model[x,y+1,z]
if (z>0):
    temp[4] = model[x,y,z-1]
if (z<z_len-1):
    temp[5] = model[x,y,z+1]
result[x,y,z] = np.count_nonzero(np.unique(temp))  

I found this is quite slow and inefficient. 我发现这很慢且效率很低。 Is there a more efficient/quicker way to do this? 有没有更有效/更快的方法来做到这一点?

Thanks. 谢谢。

Well, there might be a way: 好吧,可能有一种方法:

  • create 6 offset arrays (left, right, up, down, front, back) 创建6个偏移数组(左,右,上,下,前,后)
  • combine these arrays into a (R-2, C-2, D-2, 6) 4D array 将这些阵列组合成(R-2,C-2,D-2、6)4D阵列
  • sort the 4D array by the last dimension (the dimension with size 6) 按最后一个维度(尺寸为6的维度)对4D数组进行排序

Now you have a 4D array where you can pick a sorted vector of neighbours for each cell. 现在您有了一个4D数组,您可以在其中为每个像元选择一个排序的邻居向量。 After that you may count the different neighbours by: 之后,您可以通过以下方式计算不同的邻居:

  • use diff to the 4th axis (the sorted array) 使用diff到第4轴(排序数组)
  • calculate the sum of non-zero differences along the 4th axis 计算沿第四轴的非零差之和

This will give you the number of different neighbours - 1. 这将为您提供不同邻居的数量-1。

The first part is probably rather clear. 第一部分可能很清楚。 If a cell has neighbours (1, 2, 4, 2, 2, 3), the neighbour vector is sorted into (1, 2, 2, 2, 3, 4). 如果一个单元具有邻居(1、2、4、2、2、3),则将邻居向量分类为(1、2、2、2、3、4)。 The difference vector is then (1, 0, 0, 1, 1), and the sum of non-zero elements ( (diff(v) != 0).sum(axis=4) ) gives 3. So, there are 4 unique neighbours. 则差向量为( (diff(v) != 0).sum(axis=4) ),非零元素的总和( (diff(v) != 0).sum(axis=4) )为3。 4个独特的邻居。

Of course, this method does not take the edges into account. 当然,这种方法没有考虑到边缘。 That you can solve by padding the initial array by 1 cell into each direction by numpy.pad with mode reflect . 可以通过由1个细胞的初始阵列填充到每个方向解决numpy.pad与模式reflect (That mode is actually the only one that is guaranteed not to introduce any new values into the neighbourhood, try it with a two-dimensional array to understand why.) (该模式实际上是唯一保证不会向邻域引入任何新值的模式,请尝试使用二维数组来了解原因)。

For example: 例如:

import numpy as np

# create some fictional data
dat = np.random.randint(1, 8, (6, 7, 8))

# pad the data by 1
datp = np.pad(dat, 1, mode='reflect')

# create the neighbouring 4D array
neigh = np.concatenate((
    datp[2:,1:-1,1:-1,None], datp[:-2,1:-1,1:-1,None], 
    datp[1:-1,2:,1:-1,None], datp[1:-1,:-2,1:-1,None],
    datp[1:-1,1:-1,2:,None], datp[1:-1,1:-1,:-2,None]), axis=3)

# sort the 4D array
neigh.sort(axis=3)

# calculate the number of unique samples
usamples = (diff(neigh, axis=3) != 0).sum(axis=3) + 1

The solution above is quite universal, it works with anything sortable. 上面的解决方案非常通用,可以与任何可排序的东西一起使用。 However, it consumes a lot of memory (6 copies of the array) and is not a high-performance solution. 但是,它消耗大量内存(阵列的6个副本),并且不是高性能的解决方案。 If we are satisfied with a solution that only works with this special case (values are very small integers), we can do some bit magic. 如果我们对仅适用于这种特殊情况的解决方案感到满意(值是非常小的整数),我们可以做些魔术。

  • create an array where every item is represented as a bit mask (1 = 00000001, 2 = 00000010, 3 = 00000100, etc.) 创建一个数组,其中每个项目都表示为一个位掩码(1 = 00000001、2 = 00000010、3 = 00000100等)
  • OR the neighbouring arrays together 或相邻阵列
  • count the number of bits in the ORed result by using a look-up table 使用查找表计算“或”结果中的位数

.

import numpy as np

# create a "number of ones" lookup table
no_ones = np.array([bin(i).count("1") for i in range(256)], dtype='uint8')

# create some fictional data
dat = np.random.randint(1, 8, (6, 7, 8))

# create a bit mask of the cells
datb = 1 << dat.astype('uint8')

# pad the data by 1
datb = np.pad(datb, 1, mode='reflect')

# or the padded data together
ored = (datb[ 2:, 1:-1, 1:-1] |
        datb[:-2, 1:-1, 1:-1] |
        datb[1:-1,  2:, 1:-1] |
        datb[1:-1, :-2, 1:-1] |
        datb[1:-1, 1:-1,  2:] |
        datb[1:-1, 1:-1, :-2])

# get the number of neighbours from the LUT
usamples = no_ones[ored]

The performance impact is rather significant. 性能影响非常显着。 The first version takes 2.57 s and the second version 283 ms on my machine with a 384 x 384 x 100 table (excluding creating the random data). 我的计算机上第一个版本需要2.57 s的时间,第二个版本需要283 ms,带有384 x 384 x 100的表(不包括创建随机数据)。 This translates into 19 ns and 174 ns/cell, respectively. 这分别转换为19 ns和174 ns /单元。

This solution is however limited to the case where there is a reasonable number of different (and known) values. 但是,该解决方案限于存在合理数量的不同(已知)值的情况。 If the number of different possible values grows above 64, the bit magic loses its charm. 如果不同的可能值的数量增加到64以上,则位魔术将失去其魅力。 (Also, at around 20 different values the look-up part has to be split into more than one operation do to the memory consumption of the LUT. The LUT should fit into the CPU cache, otherwise it becomes slow.) (另外,在查找LUT的内存方面,查找部分必须以大约20个不同的值分为多个操作。LUT应该适合CPU缓存,否则会变慢。)

On the other hand, expanding the solution to use the full 26-neighbourhood is simple and quite fast. 另一方面,将解决方案扩展为使用整个26个邻域很简单且相当快。

You could try the following, not necessarily optimal and will cause problems if your data are too large, but here goes 您可以尝试以下操作,但不一定是最佳方法,如果数据太大,则会导致问题,但是这里

import numpy as np
from sklearn.feature_extraction.image import extract_patches

a = np.array([[1,1,1,7,4],
              [1,1,1,3,2],
              [1,1,1,2,2],
              [1,3,1,4,2],
              [1,1,1,4,2]])

patches = extract_patches(a, patch_shape=(3, 3), extraction_step=(1, 1))

neighbor_template = np.array([[0, 1, 0],
                              [1, 0, 1],
                              [0, 1, 0]]).astype(np.bool)
centers = patches[:, :, 1, 1]
neighbors = patches[:, :, neighbor_template]

possible_values = np.arange(1, 8)
counts = (neighbors[..., np.newaxis] ==
          possible_values[np.newaxis, np.newaxis, np.newaxis]).sum(2)

nonzero_counts = counts > 0
unique_counter = nonzero_counts.sum(-1)

print unique_counter

yields 产量

[[1 2 3]
 [2 2 4]
 [1 3 3]]

Which is the middle of the array you are expecting as a result. 结果是您期望的数组中间。 In order to obtain the full array with borders, the borders would need to be treated separately. 为了获得具有边界的完整阵列,将需要分别处理边界。 With numpy 1.8 you can use np.pad with mode median reflect to pad with one pixel. 使用numpy 1.8时,可以将np.pad与模式median reflect一起使用一个像素进行填充。 This would also complete the border correctly. 这也将正确完成边框。

Now let's move to 3D and make sure we don't use too much memory. 现在,我们转到3D,并确保我们不使用过多的内存。

# first we generate a neighbors template
from scipy.ndimage import generate_binary_structure

neighbors = generate_binary_structure(3, 1)
neighbors[1, 1, 1] = False
neighbor_coords = np.array(np.where(neighbors)).T

data = np.random.randint(1, 8, (384, 384, 100))
data_neighbors = np.zeros((neighbors.sum(),) + tuple(np.array(data.shape) - 2), dtype=np.uint8)

# extract_patches only generates a strided view
data_view = extract_patches(data, patch_shape=(3, 3, 3), extraction_step=(1, 1, 1))

for neigh_coord, data_neigh in zip(neighbor_coords, data_neighbors):
    sl = [slice(None)] * 3 + list(neigh_coord)
    data_neigh[:] = data_view[sl]

indicator = (data_neigh[np.newaxis] == possible_values[:, np.newaxis, np.newaxis, np.newaxis]).sum(1) > 0

uniques = indicator.sum(0)

As before, you find the number of unique entries in uniques . 和以前一样,您可以在uniques找到唯一条目的数量。 Using methods like generate_binary_structure from scipy and the sliding window from extract_patches makes this approach general: If you wanted a 26-neighborhood instead of a 6-neighborhood, then you would only have to change generate_binary_structure(3, 1) to generate_binary_structure(3, 2) . 使用方法,如generate_binary_structure从SciPy的和滑动窗口extract_patches使得这种方法一般:如果你想要一台26附近,而不是6个居委会,那么只需要改变generate_binary_structure(3, 1)generate_binary_structure(3, 2) It also generalized straightforwardly to extra dimensions, provided the amount of data generated fit in the memory of your machine. 只要生成的数据量适合您的计算机内存,它还可以直接推广到其他尺寸。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM