Let's make a reference implementation of N-dimensional pixel binning/bucketing for python's numpy

Question

I frequently want to pixel bin/pixel bucket a numpy array, meaning, replace groups of N consecutive pixels with a single pixel which is the sum of the N replaced pixels. For example, start with the values:

x = np.array([1, 3, 7, 3, 2, 9])

with a bucket size of 2, this transforms into:

bucket(x, bucket_size=2) 
= [1+3, 7+3, 2+9]
= [4, 10, 11]

As far as I know, there's no numpy function that specifically does this (please correct me if I'm wrong!), so I frequently roll my own. For 1d numpy arrays, this isn't bad:

import numpy as np

def bucket(x, bucket_size):
    return x.reshape(x.size // bucket_size, bucket_size).sum(axis=1)

bucket_me = np.array([3, 4, 5, 5, 1, 3, 2, 3])
print(bucket(bucket_me, bucket_size=2)) #[ 7 10  4  5]

...however, I get confused easily for the multidimensional case, and I end up rolling my own buggy, half-assed solution to this "easy" problem over and over again. I'd love it if we could establish a nice N-dimensional reference implementation.

Preferably the function call would allow different bin sizes along different axes (perhaps something like bucket(x, bucket_size=(2, 2, 3)) )
Preferably the solution would be reasonably efficient (reshape and sum are fairly quick in numpy)
Bonus points for handling edge effects when the array doesn't divide nicely into an integer number of buckets.
Bonus points for allowing the user to choose the initial bin edge offset.

As suggested by Divakar, here's my desired behavior in a sample 2-D case:

x = np.array([[1, 2, 3, 4],
              [2, 3, 7, 9],
              [8, 9, 1, 0],
              [0, 0, 3, 4]])

bucket(x, bucket_size=(2, 2))
= [[1 + 2 + 2 + 3, 3 + 4 + 7 + 9],
   [8 + 9 + 0 + 0, 1 + 0 + 3 + 4]]
= [[8, 23],
   [17, 8]]

...hopefully I did my arithmetic correctly ;)

Answer 1

I think you can do most of the fiddly work with skimage's view_as_blocks . This function is implemented using as_strided so it is very efficient (it just changes the stride information to reshape the array). Because it's written in Python/NumPy, you can always copy the code if you don't have skimage installed.

After applying that function, you just need to sum the N trailing axes of the reshaped array (where N is the length of the bucket_size tuple). Here's a new bucket() function:

from skimage.util import view_as_blocks

def bucket(x, bucket_size):
    blocks = view_as_blocks(x, bucket_size)
    tup = tuple(range(-len(bucket_size), 0))
    return blocks.sum(axis=tup)

Then for example:

>>> x = np.array([1, 3, 7, 3, 2, 9])
>>> bucket(x, bucket_size=(2,))
array([ 4, 10, 11])

>>> x = np.array([[1, 2, 3, 4],
                  [2, 3, 7, 9],
                  [8, 9, 1, 0],
                  [0, 0, 3, 4]])

>>> bucket(x, bucket_size=(2, 2))
array([[ 8, 23],
       [17,  8]])

>>> y = np.arange(6*6*6).reshape(6,6,6)
>>> bucket(y, bucket_size=(2, 2, 3))
array([[[ 264,  300],
        [ 408,  444],
        [ 552,  588]],

       [[1128, 1164],
        [1272, 1308],
        [1416, 1452]],

       [[1992, 2028],
        [2136, 2172],
        [2280, 2316]]])

Answer 2

To specify different bin sizes along each axis for ndarray cases, you can use iteratively use np.add.reduceat along each axis of it, like so -

def bucket(x, bin_size):
    ndims = x.ndim
    out = x.copy()
    for i in range(ndims):
        idx = np.append(0,np.cumsum(bin_size[i][:-1]))
        out = np.add.reduceat(out,idx,axis=i)
    return out

Sample run -

In [126]: x
Out[126]: 
array([[165, 107, 133,  82, 199],
       [ 35, 138,  91, 100, 207],
       [ 75,  99,  40, 240, 208],
       [166, 171,  78,   7, 141]])

In [127]: bucket(x, bin_size = [[2, 2],[3, 2]])
Out[127]: 
array([[669, 588],
       [629, 596]])

#  [2, 2] are the bin sizes along axis=0
#  [3, 2] are the bin sizes along axis=1

# array([[165, 107, 133, | 82, 199],
#        [ 35, 138,  91, | 100, 207],
# -------------------------------------
#        [ 75,  99, 40,  | 240, 208],
#        [166, 171, 78,  | 7, 141]])

In [128]: x[:2,:3].sum()
Out[128]: 669

In [129]: x[:2,3:].sum()
Out[129]: 588

In [130]: x[2:,:3].sum()
Out[130]: 629

In [131]: x[2:,3:].sum()
Out[131]: 596

Answer 3

Natively from as_strided :

x = array([[1, 2, 3, 4],
           [2, 3, 7, 9],
           [8, 9, 1, 0],
           [0, 0, 3, 4]])

from numpy.lib.stride_tricks import as_strided     
def bucket(x,bucket_size):
      x=np.ascontiguousarray(x)
      oldshape=array(x.shape)
      newshape=concatenate((oldshape//bucket_size,bucket_size))
      oldstrides=array(x.strides)
      newstrides=concatenate((oldstrides*bucket_size,oldstrides))
      axis=tuple(range(x.ndim,2*x.ndim))
      return as_strided (x,newshape,newstrides).sum(axis)

if a dimension not divide evenly into the corresponding dimension of x, remaining elements are lost.

verification :

In [9]: bucket(x,(2,2))
Out[9]: 
array([[ 8, 23],
       [17,  8]])

Let's make a reference implementation of N-dimensional pixel binning/bucketing for python's numpy

Question

3 answers

solution1
4 2016-03-28 19:36:21

solution2
1 2016-03-28 20:02:47

solution3
1 ACCPTED 2016-03-28 20:11:14

Let's make a reference implementation of N-dimensional pixel binning/bucketing for python's numpy

Question

3 answers

solution1 4 2016-03-28 19:36:21

solution2 1 2016-03-28 20:02:47

solution3 1 ACCPTED 2016-03-28 20:11:14

solution1
4 2016-03-28 19:36:21

solution2
1 2016-03-28 20:02:47

solution3
1 ACCPTED 2016-03-28 20:11:14