I have a sum of sums that I want to speed up. In one case it is:
S_{x,y,k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly}
In the other case it is:
S_{x,y} ( S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} )^2
Note: S_{indices}: is the sum over those indices
The first case I have figured out how to do using numpy's einsum
and it results in an amazing speedup ~ x160.
Also, I have thought of trying to expand the square but won't that be a killer as I would need to sum over x,y,k,l,k,l instead of x,y,k,l?
Here is an implementation that demonstrates the difference and the solution I have with einsum
.
Nx = 3
Ny = 4
Nk = 5
Nl = 6
Nu = 7
Nv = 8
Fx = np.random.rand(Nx, Nk)
Fy = np.random.rand(Ny, Nl)
Fu = np.random.rand(Nu, Nk)
Fv = np.random.rand(Nv, Nl)
P = np.random.rand(Nx, Ny)
B = np.random.rand(Nk, Nl)
I1 = np.zeros([Nu, Nv])
I2 = np.zeros([Nu, Nv])
t = time.time()
for iu in range(Nu):
for iv in range(Nv):
for ix in range(Nx):
for iy in range(Ny):
S = 0.
for ik in range(Nk):
for il in range(Nl):
S += Fu[iu,ik]*Fv[iv,il]*Fx[ix,ik]*Fy[iy,il]*P[ix,iy]*B[ik,il]
I1[iu, iv] += S
I2[iu, iv] += S**2.
print time.time() - t; t = time.time()
# 0.0787379741669
I1_ = np.einsum('uk, vl, xk, yl, xy, kl->uv', Fu, Fv, Fx, Fy, P, B)
print time.time() - t
# 0.00049090385437
print np.allclose(I1_, I1)
# True
# Solution by expanding the square (not ideal)
t = time.time()
I2_ = np.einsum('uk,vl,xk,yl,um,vn,xm,yn,kl,mn,xy->uv', Fu,Fv,Fx,Fy,Fu,Fv,Fx,Fy,B,B,P**2)
print time.time() - t
# 0.0226809978485 <- faster than for loop but still much slower than I1_ einsum
print np.allclose(I2_, I2)
# True
As shown I've managed to do I1_ with I've figured out how to do the above with einsum
for I1
.
EDIT:
I added how to do I2_
by expanding the square but the speed up is a bit disappointing and to be expected... ~x3.47 speedup compared to ~x160
EDIT2:
The speedups don't seem to be consistent, I had gotten before a x40 and an x1.2 but now get different numbers. Either way the difference and the question remain.
EDIT3: I tried to simplify the sum I'm actually after but messed up and the sum above allows for the excellent answer provided by @user5402.
I've edited the code above to demonstrate the sum which is below:
I1 = S_{x,y,k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} P_{xy} B_{kl}
I2 = S_{x,y} ( S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} P_{xy} B_{kl} )^2
(Update: Jump to the end to see the result expressed as a couple of matrix multiplications.)
I think you can greatly simplify the computation by using the identity:
For instance,
S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly}
= S_{k,l} Fu_{ku} Fx_{kx} Fv_{lv} Fy_{ly} -- rearrange the factors
\___ A ____/ \___ B ____/
= ( S_k Fu_{ku} Fx_{kx} ) * ( S_l Fv_{lv} Fy_{ly} ) -- from the identity
= A_{ux} * B_{vy}
where A_{ux}
only depends on u
and x
and B_{vy}
only depends on v
and y
.
For the square sum, we have:
S_k [ S_l Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} ]^2
= S_k Fu_{ku} Fx_{kx} * [ S_l Fv_{lv} Fy_{ly} ]^2
= S_k Fu_{ku} Fx_{kx} * B_{vy}^2 -- B is from the above calc.
= B_{vy}^2 * S_k Fu_{ku} Fx_{kx} -- B_vy is free of k
= B_{vy}^2 * A_{ux} -- A is from the above calc.
Similar reductions occur when continuing the sum over x
and y
:
S_{xy} A_{ux} * B_{vy}
= S_x A_{ux} * S_y B_{vy} -- from the identity
= C_u * D_v
And then finally summing over u
and v
:
S_{uv} C_u D_v = (S_u C_u) * (S_v D_v) -- from the identity
Hope this helps.
Update: I just realized that perhaps for the square sum you wanted to compute [ S_k S_l ... ]^2
in which case you can proceed like this:
[ S_k S_l Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} ]^2
= [ A_{ux} * B_{vy} ]^2
= A_{ux}^2 * B_{vy}^2
So when we sum over the over variables we get:
S_{uvxy} A_{ux}^2 B_{vy}^2
= S_{uv} ( S_{xy} A_{ux}^2 B_{vy}^2 )
= S_{uv} ( S_x A_{ux}^2 ) * ( S_y B_{vy}^2 ) -- from the identity
= S_{uv} C_u * D_v
= (S_u C_u) * (S_v D_v) -- from the identity
Update 2: This does boil down to just a few matrix multiplications.
The definitions of A and B:
A_{uv} = S_k Fu_{ku} Fx_{kx}
B_{vy} = S_l Fv_{lv} Fy_{ly}
may also be written in matrix form as:
A = (transpose Fu) . Fx -- . = matrix multiplication
B = (transpose Fv) . Fy
and the definition of C and D:
C_u = S_x A_{ux}
D_v = S_y B_{vy}
we see that the vector C is just the row sums of A and the vector D is just the row sums of B. Since the answer for the entire summation (not squared) is:
total = (S_u C_u) * (S_v D_v)
we see that the total is just the sum of all of the matrix elements of A times the sum of all of the matrix elements of B.
Here is the numpy code:
from numpy import *
# ... set up Fx, Fv, Fu, Fy as above...
A = Fx.dot(Fu.transpose())
B = Fv.dot(Fy.transpose())
sum1 = sum(A) * sum(B)
A2 = square(A)
B2 = square(B)
sum2 = sum(A2) * sum(B2)
print "sum of terms:", sum1
print "sum of squares of terms:", sum2
I'll start a new answer since the problem has changed.
Try this:
E = np.einsum('uk, vl, xk, yl, xy, kl->uvxy', Fu, Fv, Fx, Fy, P, B)
E1 = np.einsum('uvxy->uv', E)
E2 = np.einsum('uvxy->uv', np.square(E))
I've found it runs just as fast as the time for I1_.
Here is my test code: http://pastebin.com/ufwy7cLy
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.