[英]Let's make a reference implementation of N-dimensional pixel binning/bucketing for python's numpy
I frequently want to pixel bin/pixel bucket a numpy array, meaning, replace groups of N
consecutive pixels with a single pixel which is the sum of the N
replaced pixels. 我经常想要像素bin /像素桶一个numpy数组,意思是用一个像素替换
N
个连续像素的组,这个像素是N
替换像素的总和。 For example, start with the values: 例如,从值开始:
x = np.array([1, 3, 7, 3, 2, 9])
with a bucket size of 2, this transforms into: 桶大小为2时,转换为:
bucket(x, bucket_size=2)
= [1+3, 7+3, 2+9]
= [4, 10, 11]
As far as I know, there's no numpy function that specifically does this (please correct me if I'm wrong!), so I frequently roll my own. 据我所知,没有专门做这个的numpy功能(请纠正我,如果我错了!),所以我经常推出自己的。 For 1d numpy arrays, this isn't bad:
对于1d numpy数组,这不错:
import numpy as np
def bucket(x, bucket_size):
return x.reshape(x.size // bucket_size, bucket_size).sum(axis=1)
bucket_me = np.array([3, 4, 5, 5, 1, 3, 2, 3])
print(bucket(bucket_me, bucket_size=2)) #[ 7 10 4 5]
...however, I get confused easily for the multidimensional case, and I end up rolling my own buggy, half-assed solution to this "easy" problem over and over again. ...但是,我很容易对多维案例感到困惑,最后我一遍又一遍地推动自己的错误,半解决这个“简单”的问题。 I'd love it if we could establish a nice N-dimensional reference implementation.
如果我们能够建立一个漂亮的N维参考实现,我会喜欢它。
Preferably the function call would allow different bin sizes along different axes (perhaps something like bucket(x, bucket_size=(2, 2, 3))
) 优选地,函数调用将允许沿不同轴的不同的bin大小(可能类似于
bucket(x, bucket_size=(2, 2, 3))
)
Preferably the solution would be reasonably efficient (reshape and sum are fairly quick in numpy) 优选地,解决方案将是合理有效的(重塑和总和相当快速的numpy)
Bonus points for handling edge effects when the array doesn't divide nicely into an integer number of buckets. 当数组没有很好地划分为整数个桶时,处理边缘效应的加成点。
Bonus points for allowing the user to choose the initial bin edge offset. 允许用户选择初始bin边缘偏移的加分点。
As suggested by Divakar, here's my desired behavior in a sample 2-D case: 正如Divakar所建议的那样,这是我在样本2-D案例中所希望的行为:
x = np.array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
bucket(x, bucket_size=(2, 2))
= [[1 + 2 + 2 + 3, 3 + 4 + 7 + 9],
[8 + 9 + 0 + 0, 1 + 0 + 3 + 4]]
= [[8, 23],
[17, 8]]
...hopefully I did my arithmetic correctly ;) ...希望我正确地做了算术;)
I think you can do most of the fiddly work with skimage's view_as_blocks
. 我认为你可以使用skimage的
view_as_blocks
完成大部分繁琐的工作。 This function is implemented using as_strided
so it is very efficient (it just changes the stride information to reshape the array). 此函数使用
as_strided
实现,因此它非常有效(它只是更改步幅信息以重塑数组)。 Because it's written in Python/NumPy, you can always copy the code if you don't have skimage installed. 因为它是用Python / NumPy编写的,所以如果你没有安装skimage,你总是可以复制代码。
After applying that function, you just need to sum the N trailing axes of the reshaped array (where N is the length of the bucket_size
tuple). 应用该函数后,您只需要对重新整形的数组的N个尾轴求和(其中N是
bucket_size
元组的长度)。 Here's a new bucket()
function: 这是一个新的
bucket()
函数:
from skimage.util import view_as_blocks
def bucket(x, bucket_size):
blocks = view_as_blocks(x, bucket_size)
tup = tuple(range(-len(bucket_size), 0))
return blocks.sum(axis=tup)
Then for example: 然后例如:
>>> x = np.array([1, 3, 7, 3, 2, 9])
>>> bucket(x, bucket_size=(2,))
array([ 4, 10, 11])
>>> x = np.array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
>>> bucket(x, bucket_size=(2, 2))
array([[ 8, 23],
[17, 8]])
>>> y = np.arange(6*6*6).reshape(6,6,6)
>>> bucket(y, bucket_size=(2, 2, 3))
array([[[ 264, 300],
[ 408, 444],
[ 552, 588]],
[[1128, 1164],
[1272, 1308],
[1416, 1452]],
[[1992, 2028],
[2136, 2172],
[2280, 2316]]])
To specify different bin sizes along each axis for ndarray
cases, you can use iteratively use np.add.reduceat
along each axis of it, like so - 要为
ndarray
案例指定沿每个轴的不同bin大小,可以沿着它的每个轴迭代地使用np.add.reduceat
,就像这样 -
def bucket(x, bin_size):
ndims = x.ndim
out = x.copy()
for i in range(ndims):
idx = np.append(0,np.cumsum(bin_size[i][:-1]))
out = np.add.reduceat(out,idx,axis=i)
return out
Sample run - 样品运行 -
In [126]: x
Out[126]:
array([[165, 107, 133, 82, 199],
[ 35, 138, 91, 100, 207],
[ 75, 99, 40, 240, 208],
[166, 171, 78, 7, 141]])
In [127]: bucket(x, bin_size = [[2, 2],[3, 2]])
Out[127]:
array([[669, 588],
[629, 596]])
# [2, 2] are the bin sizes along axis=0
# [3, 2] are the bin sizes along axis=1
# array([[165, 107, 133, | 82, 199],
# [ 35, 138, 91, | 100, 207],
# -------------------------------------
# [ 75, 99, 40, | 240, 208],
# [166, 171, 78, | 7, 141]])
In [128]: x[:2,:3].sum()
Out[128]: 669
In [129]: x[:2,3:].sum()
Out[129]: 588
In [130]: x[2:,:3].sum()
Out[130]: 629
In [131]: x[2:,3:].sum()
Out[131]: 596
Natively from as_strided : 本地来自as_strided:
x = array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
from numpy.lib.stride_tricks import as_strided
def bucket(x,bucket_size):
x=np.ascontiguousarray(x)
oldshape=array(x.shape)
newshape=concatenate((oldshape//bucket_size,bucket_size))
oldstrides=array(x.strides)
newstrides=concatenate((oldstrides*bucket_size,oldstrides))
axis=tuple(range(x.ndim,2*x.ndim))
return as_strided (x,newshape,newstrides).sum(axis)
if a dimension not divide evenly into the corresponding dimension of x, remaining elements are lost. 如果尺寸未均匀分配到x的相应尺寸,则剩余元素将丢失。
verification : 验证:
In [9]: bucket(x,(2,2))
Out[9]:
array([[ 8, 23],
[17, 8]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.