简体   繁体   English

通过求和来聚合Numpy数组

[英]Aggregate Numpy Array By Summing

I have a Numpy array of shape (4320,8640) . 我有一个Numpy (4320,8640) I would like to have an array of shape (2160,4320) . 我想有一个形状的阵列(2160,4320)

You'll notice that each cell of the new array maps to a 2x2 set of cells in the old array. 您会注意到新阵列的每个单元格都映射到旧数组中的2x2单元格集。 I would like a cell's value in the new array to be the sum of the values in this block in the old array. 我希望新数组中的单元格值是旧数组中此块中值的总和。

I can achieve this as follows: 我可以这样做:

import numpy

#Generate an example array
arr = numpy.random.randint(10,size=(4320,8640))

#Perform the transformation
arrtrans = numpy.array([ [ arr[y][x]+arr[y+1][x]+arr[y][x+1]+arr[y+1][x+1] for x in range(0,8640,2)] for y in range(0,4320,2)])

But this is slow and more than a little ugly. 但这很慢,而且有点难看。

Is there a way to do this using Numpy (or an interoperable package)? 有没有办法使用Numpy(或可互操作的包)?

When the window fits exactly into the array, reshaping to more dimensions and collapsing the extra dimensions with np.sum is sort of the canonical way of doing this with numpy: 当窗口完全适合数组时,重塑为更多维度并使用np.sum折叠额外维度是使用numpy执行此操作的规范方法:

>>> a = np.random.rand(4320,8640)
>>> a.shape
(4320, 8640)
>>> a_small = a.reshape(2160, 2, 4320, 2).sum(axis=(1, 3))
>>> a_small.shape
(2160, 4320)
>>> np.allclose(a_small[100, 203], a[200:202, 406:408].sum())
True

I'm not sure there exists the package you want, but this code will compute much faster. 我不确定是否存在您想要的软件包,但此代码的计算速度要快得多。

>>> arrtrans2 = arr[::2, ::2] + arr[::2, 1::2] + arr[1::2, ::2] + arr[1::2, 1::2]
>>> numpy.allclose(arrtrans, arrtrans2)
True

Where ::2 and 1::2 are translated by 0, 2, 4, ... and 1, 3, 5, ... respectively. 其中::21::2分别由0, 2, 4, ...1, 3, 5, ...

You are operating on sliding windows of the original array. 您正在原始阵列的滑动窗口上操作。 There are numerous questions and answers on SO regarding. 有关SO的问题和答案很多。 sliding windows and numpy and python. 滑动窗户和numpy和python。 By manipulating the strides of an array, this process can be sped up considerably. 通过操纵数组的步幅,可以大大加快这个过程。 Here is a generic function that will return (x,y) windows of the array with or without overlap. 这是一个泛型函数,它将返回数组的(x,y)窗口,有或没有重叠。 Using this stride trick appears to be just a hair slower than @mskimm's solution. 使用这个步幅似乎只比@ mskimm的解决方案慢一点。 It's a nice thing to have in your toolkit. 在您的工具箱中使用它是一件好事。 This function is not mine - it was found at Efficient Overlapping Windows with Numpy 这个功能不是我的 - 它是在Numpy的Efficient Overlapping Windows中找到的

import numpy as np
from numpy.lib.stride_tricks import as_strided as ast
from itertools import product

def norm_shape(shape):
    '''
    Normalize numpy array shapes so they're always expressed as a tuple, 
    even for one-dimensional shapes.

    Parameters
        shape - an int, or a tuple of ints

    Returns
        a shape tuple

    from http://www.johnvinyard.com/blog/?p=268
    '''
    try:
        i = int(shape)
        return (i,)
    except TypeError:
        # shape was not a number
        pass

    try:
        t = tuple(shape)
        return t
    except TypeError:
        # shape was not iterable
        pass

    raise TypeError('shape must be an int, or a tuple of ints')


def sliding_window(a,ws,ss = None,flatten = True):
    '''
    Return a sliding window over a in any number of dimensions

    Parameters:
        a  - an n-dimensional numpy array
        ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size 
             of each dimension of the window
        ss - an int (a is 1D) or tuple (a is 2D or greater) representing the 
             amount to slide the window in each dimension. If not specified, it
             defaults to ws.
        flatten - if True, all slices are flattened, otherwise, there is an 
                  extra dimension for each dimension of the input.

    Returns
        an array containing each n-dimensional window from a

    from http://www.johnvinyard.com/blog/?p=268
    '''

    if None is ss:
        # ss was not provided. the windows will not overlap in any direction.
        ss = ws
    ws = norm_shape(ws)
    ss = norm_shape(ss)

    # convert ws, ss, and a.shape to numpy arrays so that we can do math in every 
    # dimension at once.
    ws = np.array(ws)
    ss = np.array(ss)
    shape = np.array(a.shape)


    # ensure that ws, ss, and a.shape all have the same number of dimensions
    ls = [len(shape),len(ws),len(ss)]
    if 1 != len(set(ls)):
        error_string = 'a.shape, ws and ss must all have the same length. They were{}'
        raise ValueError(error_string.format(str(ls)))

    # ensure that ws is smaller than a in every dimension
    if np.any(ws > shape):
        error_string = 'ws cannot be larger than a in any dimension. a.shape was {} and ws was {}'
        raise ValueError(error_string.format(str(a.shape),str(ws)))

    # how many slices will there be in each dimension?
    newshape = norm_shape(((shape - ws) // ss) + 1)
    # the shape of the strided array will be the number of slices in each dimension
    # plus the shape of the window (tuple addition)
    newshape += norm_shape(ws)
    # the strides tuple will be the array's strides multiplied by step size, plus
    # the array's strides (tuple addition)
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
    strided = ast(a,shape = newshape,strides = newstrides)
    if not flatten:
        return strided

    # Collapse strided so that it has one more dimension than the window.  I.e.,
    # the new array is a flat list of slices.
    meat = len(ws) if ws.shape else 0
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
    dim = firstdim + (newshape[-meat:])
    # remove any dimensions with size 1
    dim = filter(lambda i : i != 1,dim)
    return strided.reshape(dim)

Usage: 用法:

# 2x2 windows with NO overlap
b = sliding_window(arr, (2,2), flatten = False)
c = b.sum((1,2))

Approximate 24% performance improvement using numpy.einsum 使用numpy.einsum大约提高24%的性能

c = np.einsum('ijkl -> ij', b)

One SO Q&A example How can I efficiently process a numpy array in blocks similar to Matlab's blkproc (blockproc) function , the selected answer would work for you. 一个SO Q&A示例如何在类似于Matlab的blkproc(blockproc)函数的块中有效地处理numpy数组 ,所选答案对您有用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM