I am trying to split up my dot product into steps. In my case, 2 steps.
>>> a=np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]])
>>> a.dot(a.T)
array([[ 14, 32, 50, 68, 86, 104],
[ 32, 77, 122, 167, 212, 257],
[ 50, 122, 194, 266, 338, 410],
[ 68, 167, 266, 365, 464, 563],
[ 86, 212, 338, 464, 590, 716],
[104, 257, 410, 563, 716, 869]])
I am able to get the first and the 4th quadrant but not sure how to obtain the second and third quadrant
>>> a[0:3].dot(a[0:3].T)
array([[ 14, 32, 50],
[ 32, 77, 122],
[ 50, 122, 194]])
>>> a[3:].dot(a[3:].T)
array([[365, 464, 563],
[464, 590, 716],
[563, 716, 869]])
What you are looking to do is a looped GEMM. The following is some quick code to do so:
def loop_gemm(a, b, c=None, chunksize=100):
size_i = a.shape[0]
size_zip = a.shape[1]
size_j = b.shape[1]
size_alt_zip = b.shape[0]
if size_zip != size_alt_zip:
ValueError("Loop GEMM zip index is not of the same size for both tensors")
if c is None:
c = np.zeros((size_i, size_j))
istart = 0
for i in range(int(np.ceil(size_i / float(chunksize)))):
left_slice = slice(istart, istart+chunksize)
left_view = a[left_slice]
jstart = 0
for j in range(int(np.ceil(size_j / float(chunksize)))):
right_slice = slice(jstart, jstart+chunksize)
right_view = b[:, right_slice]
c[left_slice, right_slice] = np.dot(left_view, right_view)
jstart += chunksize
istart += chunksize
return c
We see that it works here:
a = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]])
loop_gemm(a, a.T, chunksize=2)
array([[ 14., 32., 50., 68., 86., 104.],
[ 32., 77., 122., 167., 212., 257.],
[ 50., 122., 194., 266., 338., 410.],
[ 68., 167., 266., 365., 464., 563.],
[ 86., 212., 338., 464., 590., 716.],
[ 104., 257., 410., 563., 716., 869.]])
You can also loop over the zip index and scale different parts to retain the full functionality of GEMM.
However, unless you are doing something specific this is typically not a good way to go about it:
a = np.random.rand(1E3, 1E4)
b = np.random.rand(1E4, 1E3)
%timeit np.dot(a,b)
10 loops, best of 3: 137 ms per loop
%timeit loop_gemm(a,b)
1 loops, best of 3: 311 ms per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.