简体   繁体   English

让我们为python的numpy做一个N维像素分组/分组的参考实现

[英]Let's make a reference implementation of N-dimensional pixel binning/bucketing for python's numpy

I frequently want to pixel bin/pixel bucket a numpy array, meaning, replace groups of N consecutive pixels with a single pixel which is the sum of the N replaced pixels. 我经常想要像素bin /像素桶一个numpy数组,意思是用一个像素替换N个连续像素的组,这个像素是N替换像素的总和。 For example, start with the values: 例如,从值开始:

x = np.array([1, 3, 7, 3, 2, 9])

with a bucket size of 2, this transforms into: 桶大小为2时,转换为:

bucket(x, bucket_size=2) 
= [1+3, 7+3, 2+9]
= [4, 10, 11]

As far as I know, there's no numpy function that specifically does this (please correct me if I'm wrong!), so I frequently roll my own. 据我所知,没有专门做这个的numpy功能(请纠正我,如果我错了!),所以我经常推出自己的。 For 1d numpy arrays, this isn't bad: 对于1d numpy数组,这不错:

import numpy as np

def bucket(x, bucket_size):
    return x.reshape(x.size // bucket_size, bucket_size).sum(axis=1)

bucket_me = np.array([3, 4, 5, 5, 1, 3, 2, 3])
print(bucket(bucket_me, bucket_size=2)) #[ 7 10  4  5]

...however, I get confused easily for the multidimensional case, and I end up rolling my own buggy, half-assed solution to this "easy" problem over and over again. ...但是,我很容易对多维案例感到困惑,最后我一遍又一遍地推动自己的错误,半解决这个“简单”的问题。 I'd love it if we could establish a nice N-dimensional reference implementation. 如果我们能够建立一个漂亮的N维参考实现,我会喜欢它。

  • Preferably the function call would allow different bin sizes along different axes (perhaps something like bucket(x, bucket_size=(2, 2, 3)) ) 优选地,函数调用将允许沿不同轴的不同的bin大小(可能类似于bucket(x, bucket_size=(2, 2, 3))

  • Preferably the solution would be reasonably efficient (reshape and sum are fairly quick in numpy) 优选地,解决方案将是合理有效的(重塑和总和相当快速的numpy)

  • Bonus points for handling edge effects when the array doesn't divide nicely into an integer number of buckets. 当数组没有很好地划分为整数个桶时,处理边缘效应的加成点。

  • Bonus points for allowing the user to choose the initial bin edge offset. 允许用户选择初始bin边缘偏移的加分点。

As suggested by Divakar, here's my desired behavior in a sample 2-D case: 正如Divakar所建议的那样,这是我在样本2-D案例中所希望的行为:

x = np.array([[1, 2, 3, 4],
              [2, 3, 7, 9],
              [8, 9, 1, 0],
              [0, 0, 3, 4]])

bucket(x, bucket_size=(2, 2))
= [[1 + 2 + 2 + 3, 3 + 4 + 7 + 9],
   [8 + 9 + 0 + 0, 1 + 0 + 3 + 4]]
= [[8, 23],
   [17, 8]]

...hopefully I did my arithmetic correctly ;) ...希望我正确地做了算术;)

I think you can do most of the fiddly work with skimage's view_as_blocks . 我认为你可以使用skimage的view_as_blocks完成大部分繁琐的工作。 This function is implemented using as_strided so it is very efficient (it just changes the stride information to reshape the array). 此函数使用as_strided实现,因此它非常有效(它只是更改步幅信息以重塑数组)。 Because it's written in Python/NumPy, you can always copy the code if you don't have skimage installed. 因为它是用Python / NumPy编写的,所以如果你没有安装skimage,你总是可以复制代码。

After applying that function, you just need to sum the N trailing axes of the reshaped array (where N is the length of the bucket_size tuple). 应用该函数后,您只需要对重新整形的数组的N个尾轴求和(其中N是bucket_size元组的长度)。 Here's a new bucket() function: 这是一个新的bucket()函数:

from skimage.util import view_as_blocks

def bucket(x, bucket_size):
    blocks = view_as_blocks(x, bucket_size)
    tup = tuple(range(-len(bucket_size), 0))
    return blocks.sum(axis=tup)

Then for example: 然后例如:

>>> x = np.array([1, 3, 7, 3, 2, 9])
>>> bucket(x, bucket_size=(2,))
array([ 4, 10, 11])

>>> x = np.array([[1, 2, 3, 4],
                  [2, 3, 7, 9],
                  [8, 9, 1, 0],
                  [0, 0, 3, 4]])

>>> bucket(x, bucket_size=(2, 2))
array([[ 8, 23],
       [17,  8]])

>>> y = np.arange(6*6*6).reshape(6,6,6)
>>> bucket(y, bucket_size=(2, 2, 3))
array([[[ 264,  300],
        [ 408,  444],
        [ 552,  588]],

       [[1128, 1164],
        [1272, 1308],
        [1416, 1452]],

       [[1992, 2028],
        [2136, 2172],
        [2280, 2316]]])

To specify different bin sizes along each axis for ndarray cases, you can use iteratively use np.add.reduceat along each axis of it, like so - 要为ndarray案例指定沿每个轴的不同bin大小,可以沿着它的每个轴迭代地使用np.add.reduceat ,就像这样 -

def bucket(x, bin_size):
    ndims = x.ndim
    out = x.copy()
    for i in range(ndims):
        idx = np.append(0,np.cumsum(bin_size[i][:-1]))
        out = np.add.reduceat(out,idx,axis=i)
    return out

Sample run - 样品运行 -

In [126]: x
Out[126]: 
array([[165, 107, 133,  82, 199],
       [ 35, 138,  91, 100, 207],
       [ 75,  99,  40, 240, 208],
       [166, 171,  78,   7, 141]])

In [127]: bucket(x, bin_size = [[2, 2],[3, 2]])
Out[127]: 
array([[669, 588],
       [629, 596]])

#  [2, 2] are the bin sizes along axis=0
#  [3, 2] are the bin sizes along axis=1

# array([[165, 107, 133, | 82, 199],
#        [ 35, 138,  91, | 100, 207],
# -------------------------------------
#        [ 75,  99, 40,  | 240, 208],
#        [166, 171, 78,  | 7, 141]])

In [128]: x[:2,:3].sum()
Out[128]: 669

In [129]: x[:2,3:].sum()
Out[129]: 588

In [130]: x[2:,:3].sum()
Out[130]: 629

In [131]: x[2:,3:].sum()
Out[131]: 596

Natively from as_strided : 本地来自as_strided:

x = array([[1, 2, 3, 4],
           [2, 3, 7, 9],
           [8, 9, 1, 0],
           [0, 0, 3, 4]])

from numpy.lib.stride_tricks import as_strided     
def bucket(x,bucket_size):
      x=np.ascontiguousarray(x)
      oldshape=array(x.shape)
      newshape=concatenate((oldshape//bucket_size,bucket_size))
      oldstrides=array(x.strides)
      newstrides=concatenate((oldstrides*bucket_size,oldstrides))
      axis=tuple(range(x.ndim,2*x.ndim))
      return as_strided (x,newshape,newstrides).sum(axis)

if a dimension not divide evenly into the corresponding dimension of x, remaining elements are lost. 如果尺寸未均匀分配到x的相应尺寸,则剩余元素将丢失。

verification : 验证:

In [9]: bucket(x,(2,2))
Out[9]: 
array([[ 8, 23],
       [17,  8]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM