简体   繁体   中英

How to do a sum of sums of the square of sum of sums?

I have a sum of sums that I want to speed up. In one case it is:

S_{x,y,k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly}

In the other case it is:

S_{x,y} ( S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} )^2

Note: S_{indices}: is the sum over those indices

The first case I have figured out how to do using numpy's einsum and it results in an amazing speedup ~ x160.

Also, I have thought of trying to expand the square but won't that be a killer as I would need to sum over x,y,k,l,k,l instead of x,y,k,l?

Here is an implementation that demonstrates the difference and the solution I have with einsum .

Nx = 3
Ny = 4
Nk = 5
Nl = 6
Nu = 7
Nv = 8
Fx = np.random.rand(Nx, Nk)
Fy = np.random.rand(Ny, Nl)
Fu = np.random.rand(Nu, Nk)
Fv = np.random.rand(Nv, Nl)
P = np.random.rand(Nx, Ny)
B = np.random.rand(Nk, Nl)
I1 = np.zeros([Nu, Nv])
I2 = np.zeros([Nu, Nv])
t = time.time()
for iu in range(Nu):
    for iv in range(Nv):
        for ix in range(Nx):
            for iy in range(Ny):
                S = 0.
                for ik in range(Nk):
                    for il in range(Nl):
                        S += Fu[iu,ik]*Fv[iv,il]*Fx[ix,ik]*Fy[iy,il]*P[ix,iy]*B[ik,il]
                I1[iu, iv] += S
                I2[iu, iv] += S**2.
print time.time() - t; t = time.time()
# 0.0787379741669
I1_ = np.einsum('uk, vl, xk, yl, xy, kl->uv', Fu, Fv, Fx, Fy, P, B)
print time.time() - t
# 0.00049090385437
print np.allclose(I1_, I1)
# True
# Solution by expanding the square (not ideal)
t = time.time()
I2_ = np.einsum('uk,vl,xk,yl,um,vn,xm,yn,kl,mn,xy->uv', Fu,Fv,Fx,Fy,Fu,Fv,Fx,Fy,B,B,P**2)
print time.time() - t
# 0.0226809978485 <- faster than for loop but still much slower than I1_ einsum
print np.allclose(I2_, I2)
# True

As shown I've managed to do I1_ with I've figured out how to do the above with einsum for I1 .

EDIT:

I added how to do I2_ by expanding the square but the speed up is a bit disappointing and to be expected... ~x3.47 speedup compared to ~x160

EDIT2:

The speedups don't seem to be consistent, I had gotten before a x40 and an x1.2 but now get different numbers. Either way the difference and the question remain.

EDIT3: I tried to simplify the sum I'm actually after but messed up and the sum above allows for the excellent answer provided by @user5402.

I've edited the code above to demonstrate the sum which is below:

I1 = S_{x,y,k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} P_{xy} B_{kl}

I2 = S_{x,y} ( S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} P_{xy} B_{kl} )^2

(Update: Jump to the end to see the result expressed as a couple of matrix multiplications.)

I think you can greatly simplify the computation by using the identity:

在此处输入图片说明

For instance,

S_{k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly}
  = S_{k,l} Fu_{ku} Fx_{kx} Fv_{lv} Fy_{ly}            -- rearrange the factors
            \___ A ____/    \___ B ____/
  = ( S_k Fu_{ku} Fx_{kx} ) * ( S_l Fv_{lv} Fy_{ly} )  -- from the identity
  =   A_{ux}                * B_{vy}

where A_{ux} only depends on u and x and B_{vy} only depends on v and y .

For the square sum, we have:

S_k [ S_l Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} ]^2
  = S_k Fu_{ku} Fx_{kx} * [ S_l Fv_{lv} Fy_{ly} ]^2
  = S_k Fu_{ku} Fx_{kx} * B_{vy}^2                 -- B is from the above calc.
  = B_{vy}^2 * S_k Fu_{ku} Fx_{kx}                 -- B_vy is free of k
  = B_{vy}^2 * A_{ux}                              -- A is from the above calc.

Similar reductions occur when continuing the sum over x and y :

S_{xy} A_{ux} * B_{vy}
  = S_x A_{ux} * S_y B_{vy}                        -- from the identity
  =  C_u       *    D_v

And then finally summing over u and v :

S_{uv} C_u D_v = (S_u C_u) * (S_v D_v)             -- from the identity

Hope this helps.

Update: I just realized that perhaps for the square sum you wanted to compute [ S_k S_l ... ]^2 in which case you can proceed like this:

[ S_k  S_l Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly} ]^2
  =  [ A_{ux}                * B_{vy} ]^2
  =  A_{ux}^2 * B_{vy}^2

So when we sum over the over variables we get:

S_{uvxy} A_{ux}^2 B_{vy}^2
  = S_{uv} ( S_{xy}  A_{ux}^2 B_{vy}^2 )
  = S_{uv} ( S_x A_{ux}^2 ) * ( S_y B_{vy}^2 )     -- from the identity
  = S_{uv}     C_u          *      D_v
  = (S_u C_u) * (S_v D_v)                          -- from the identity

Update 2: This does boil down to just a few matrix multiplications.

The definitions of A and B:

A_{uv} = S_k Fu_{ku} Fx_{kx}
B_{vy} = S_l Fv_{lv} Fy_{ly}

may also be written in matrix form as:

A = (transpose Fu) . Fx             -- . = matrix multiplication
B = (transpose Fv) . Fy

and the definition of C and D:

C_u = S_x A_{ux}
D_v = S_y B_{vy}

we see that the vector C is just the row sums of A and the vector D is just the row sums of B. Since the answer for the entire summation (not squared) is:

total = (S_u C_u) * (S_v D_v)

we see that the total is just the sum of all of the matrix elements of A times the sum of all of the matrix elements of B.

Here is the numpy code:

from numpy import *
# ... set up Fx, Fv, Fu, Fy as above...

A = Fx.dot(Fu.transpose())
B = Fv.dot(Fy.transpose())
sum1 = sum(A) * sum(B)

A2 = square(A)
B2 = square(B)
sum2 = sum(A2) * sum(B2)

print "sum of terms:", sum1
print "sum of squares of terms:", sum2

I'll start a new answer since the problem has changed.

Try this:

E = np.einsum('uk, vl, xk, yl, xy, kl->uvxy', Fu, Fv, Fx, Fy, P, B)
E1 = np.einsum('uvxy->uv', E)
E2 = np.einsum('uvxy->uv', np.square(E))

I've found it runs just as fast as the time for I1_.

Here is my test code: http://pastebin.com/ufwy7cLy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM