简体   繁体   中英

numpy dot product in steps

I am trying to split up my dot product into steps. In my case, 2 steps.

>>> a=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]])
>>> a.dot(a.T)
array([[ 14,  32,  50,  68,  86, 104],
   [ 32,  77, 122, 167, 212, 257],
   [ 50, 122, 194, 266, 338, 410],
   [ 68, 167, 266, 365, 464, 563],
   [ 86, 212, 338, 464, 590, 716],
   [104, 257, 410, 563, 716, 869]])

I am able to get the first and the 4th quadrant but not sure how to obtain the second and third quadrant

>>> a[0:3].dot(a[0:3].T)
array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 50, 122, 194]])


>>> a[3:].dot(a[3:].T)
array([[365, 464, 563],
       [464, 590, 716],
       [563, 716, 869]])

What you are looking to do is a looped GEMM. The following is some quick code to do so:

def loop_gemm(a, b, c=None, chunksize=100):

    size_i = a.shape[0]
    size_zip = a.shape[1]

    size_j = b.shape[1]
    size_alt_zip = b.shape[0]

    if size_zip != size_alt_zip:
        ValueError("Loop GEMM zip index is not of the same size for both tensors")

    if c is None:
        c = np.zeros((size_i, size_j))

    istart = 0
    for i in range(int(np.ceil(size_i / float(chunksize)))):

        left_slice = slice(istart, istart+chunksize)
        left_view = a[left_slice]  

        jstart = 0
        for j in range(int(np.ceil(size_j / float(chunksize)))):

            right_slice = slice(jstart, jstart+chunksize)
            right_view = b[:, right_slice]

            c[left_slice, right_slice] = np.dot(left_view, right_view)
            jstart += chunksize

        istart += chunksize

    return c

We see that it works here:

a = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]])
loop_gemm(a, a.T, chunksize=2)
array([[  14.,   32.,   50.,   68.,   86.,  104.],
       [  32.,   77.,  122.,  167.,  212.,  257.],
       [  50.,  122.,  194.,  266.,  338.,  410.],
       [  68.,  167.,  266.,  365.,  464.,  563.],
       [  86.,  212.,  338.,  464.,  590.,  716.],
       [ 104.,  257.,  410.,  563.,  716.,  869.]])

You can also loop over the zip index and scale different parts to retain the full functionality of GEMM.

However, unless you are doing something specific this is typically not a good way to go about it:

a = np.random.rand(1E3, 1E4)
b = np.random.rand(1E4, 1E3)

%timeit np.dot(a,b)
10 loops, best of 3: 137 ms per loop

%timeit loop_gemm(a,b)
1 loops, best of 3: 311 ms per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM