简体   繁体   中英

Summing over numpy array with modulo

Consider the following setup:

import numpy as np
import itertools as it
A = np.random.rand(3,3,3,16,3,3,3,16)  # sum elements of A to arrive at...
B = np.zeros((4,4))  # a 4x4 array (output)

I have a large array 'A' that I want to sum over, but in a very specific way. 'A' has a shape of (x,x,x,16,x,x,x,16) where the 'x' is some integer. The desired result is a 4x4 matrix 'B', which I can calculate via a for-loop like so:

%%timeit
for x1,y1,z1,s1 in it.product(range(3), range(3), range(3), range(16)):
    for x2,y2,z2,s2 in it.product(range(3), range(3), range(3), range(16)):
        B[s1%4, s2%4] += A[x1,y1,z1,s1,x2,y2,z2,s2]

>> 134 ms ± 1.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

where the elements of B are "modulo-4" of the two axes with 16 elements in that dimension in 'A', here indexed by s1 and s2 .

How can I achieve the same by broadcasting, or otherwise? Obviously with larger 'x' (dimensions in 'A'), the for-loop will get exponentially longer to compute, which is not ideal.

EDIT:

C = np.zeros((4,4))
for i,j in it.product(range(4), range(4)):
    C[i,j] = A[:,:,:,i::4,:,:,:,j::4].sum()

This seems to work as well. But still involves 1 for-loop. Is there a way to make this any faster?

Here are a cleaner and a faster solution. Unfortunately, they are not the same ...

def clean(A):
    return A.reshape(4*n*n*n, 4, 4*n*n*n, 4).sum(axis=(0, 2))

def fast(A):
    return np.bincount(np.tile(np.arange(16).reshape(4, 4), (4, 4)).ravel(), A.sum((0,1,2,4,5,6)).ravel(), minlength=16).reshape(4, 4)

At n==6 fast is about three times faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM