[英]Aggregate Numpy Array By Summing
I have a Numpy array of shape (4320,8640)
. 我有一个Numpy
(4320,8640)
。 I would like to have an array of shape (2160,4320)
. 我想有一个形状的阵列
(2160,4320)
。
You'll notice that each cell of the new array maps to a 2x2 set of cells in the old array. 您会注意到新阵列的每个单元格都映射到旧数组中的2x2单元格集。 I would like a cell's value in the new array to be the sum of the values in this block in the old array.
我希望新数组中的单元格值是旧数组中此块中值的总和。
I can achieve this as follows: 我可以这样做:
import numpy
#Generate an example array
arr = numpy.random.randint(10,size=(4320,8640))
#Perform the transformation
arrtrans = numpy.array([ [ arr[y][x]+arr[y+1][x]+arr[y][x+1]+arr[y+1][x+1] for x in range(0,8640,2)] for y in range(0,4320,2)])
But this is slow and more than a little ugly. 但这很慢,而且有点难看。
Is there a way to do this using Numpy (or an interoperable package)? 有没有办法使用Numpy(或可互操作的包)?
When the window fits exactly into the array, reshaping to more dimensions and collapsing the extra dimensions with np.sum
is sort of the canonical way of doing this with numpy: 当窗口完全适合数组时,重塑为更多维度并使用
np.sum
折叠额外维度是使用numpy执行此操作的规范方法:
>>> a = np.random.rand(4320,8640)
>>> a.shape
(4320, 8640)
>>> a_small = a.reshape(2160, 2, 4320, 2).sum(axis=(1, 3))
>>> a_small.shape
(2160, 4320)
>>> np.allclose(a_small[100, 203], a[200:202, 406:408].sum())
True
I'm not sure there exists the package you want, but this code will compute much faster. 我不确定是否存在您想要的软件包,但此代码的计算速度要快得多。
>>> arrtrans2 = arr[::2, ::2] + arr[::2, 1::2] + arr[1::2, ::2] + arr[1::2, 1::2]
>>> numpy.allclose(arrtrans, arrtrans2)
True
Where ::2
and 1::2
are translated by 0, 2, 4, ...
and 1, 3, 5, ...
respectively. 其中
::2
和1::2
分别由0, 2, 4, ...
和1, 3, 5, ...
。
You are operating on sliding windows of the original array. 您正在原始阵列的滑动窗口上操作。 There are numerous questions and answers on SO regarding.
有关SO的问题和答案很多。 sliding windows and numpy and python.
滑动窗户和numpy和python。 By manipulating the strides of an array, this process can be sped up considerably.
通过操纵数组的步幅,可以大大加快这个过程。 Here is a generic function that will return (x,y) windows of the array with or without overlap.
这是一个泛型函数,它将返回数组的(x,y)窗口,有或没有重叠。 Using this stride trick appears to be just a hair slower than @mskimm's solution.
使用这个步幅似乎只比@ mskimm的解决方案慢一点。 It's a nice thing to have in your toolkit.
在您的工具箱中使用它是一件好事。 This function is not mine - it was found at Efficient Overlapping Windows with Numpy
这个功能不是我的 - 它是在Numpy的Efficient Overlapping Windows中找到的
import numpy as np
from numpy.lib.stride_tricks import as_strided as ast
from itertools import product
def norm_shape(shape):
'''
Normalize numpy array shapes so they're always expressed as a tuple,
even for one-dimensional shapes.
Parameters
shape - an int, or a tuple of ints
Returns
a shape tuple
from http://www.johnvinyard.com/blog/?p=268
'''
try:
i = int(shape)
return (i,)
except TypeError:
# shape was not a number
pass
try:
t = tuple(shape)
return t
except TypeError:
# shape was not iterable
pass
raise TypeError('shape must be an int, or a tuple of ints')
def sliding_window(a,ws,ss = None,flatten = True):
'''
Return a sliding window over a in any number of dimensions
Parameters:
a - an n-dimensional numpy array
ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size
of each dimension of the window
ss - an int (a is 1D) or tuple (a is 2D or greater) representing the
amount to slide the window in each dimension. If not specified, it
defaults to ws.
flatten - if True, all slices are flattened, otherwise, there is an
extra dimension for each dimension of the input.
Returns
an array containing each n-dimensional window from a
from http://www.johnvinyard.com/blog/?p=268
'''
if None is ss:
# ss was not provided. the windows will not overlap in any direction.
ss = ws
ws = norm_shape(ws)
ss = norm_shape(ss)
# convert ws, ss, and a.shape to numpy arrays so that we can do math in every
# dimension at once.
ws = np.array(ws)
ss = np.array(ss)
shape = np.array(a.shape)
# ensure that ws, ss, and a.shape all have the same number of dimensions
ls = [len(shape),len(ws),len(ss)]
if 1 != len(set(ls)):
error_string = 'a.shape, ws and ss must all have the same length. They were{}'
raise ValueError(error_string.format(str(ls)))
# ensure that ws is smaller than a in every dimension
if np.any(ws > shape):
error_string = 'ws cannot be larger than a in any dimension. a.shape was {} and ws was {}'
raise ValueError(error_string.format(str(a.shape),str(ws)))
# how many slices will there be in each dimension?
newshape = norm_shape(((shape - ws) // ss) + 1)
# the shape of the strided array will be the number of slices in each dimension
# plus the shape of the window (tuple addition)
newshape += norm_shape(ws)
# the strides tuple will be the array's strides multiplied by step size, plus
# the array's strides (tuple addition)
newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
strided = ast(a,shape = newshape,strides = newstrides)
if not flatten:
return strided
# Collapse strided so that it has one more dimension than the window. I.e.,
# the new array is a flat list of slices.
meat = len(ws) if ws.shape else 0
firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
dim = firstdim + (newshape[-meat:])
# remove any dimensions with size 1
dim = filter(lambda i : i != 1,dim)
return strided.reshape(dim)
Usage: 用法:
# 2x2 windows with NO overlap
b = sliding_window(arr, (2,2), flatten = False)
c = b.sum((1,2))
Approximate 24% performance improvement using numpy.einsum
使用
numpy.einsum
大约提高24%的性能
c = np.einsum('ijkl -> ij', b)
One SO Q&A example How can I efficiently process a numpy array in blocks similar to Matlab's blkproc (blockproc) function , the selected answer would work for you. 一个SO Q&A示例如何在类似于Matlab的blkproc(blockproc)函数的块中有效地处理numpy数组 ,所选答案对您有用 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.