简体   繁体   中英

How to matrix-multiply a 2D numpy array with a 3D array to give a 3D array?

I am solving a photometric stereo problem, in which I have "n" number of light sources with 3 channels(Red, Green, Blue) each. Thus light array is of shape nx3: lights.shape = nx3 I have the images corresponding to each lighting condition. image dimensions are hxw (height x width), images.shape = nxhxw

I want to matrix multiple each pixel in the image to a matrix of shape 3 xn and get another array of shape 3xhxw these will be the normal vector of each pixel on the image.

shapes of:

  • images: (n_ims, h, w)
  • lights: (n_ims, 3)
S = lights
S_pinv =  np.linalg.inv(S.T@S)@S.T  # pinv is pseudo inverse, S_pinv.shape : (n_ims,3)
b = S_pinv @ images  # I want (3xn @ nxhxw = 3xhxw)

But I am getting this error:

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 100 is different from 3)

The problem is that numpy views multidimensional arrays as stacks of matrices, and always the last two dimensions are assumed to be the linear space dimensions. This means that the dot product will not work by collapsing the first dimension of your 3d array.

Instead the simplest thing you can do is to reshape your 3d array into a 2d one, doing the matrix multiplication, and reshaping back into a 3d array. This will also make use of optimised BLAS code which is one of the great advantages of numpy.

import numpy as np 

S_pinv = np.random.rand(3, 4)
images = np.random.rand(4, 5, 6)

# error: 
# (S_pinv @ images).shape 
res_shape = S_pinv.shape[:1] + images.shape[1:]  # (3, 5, 6) 
res = (S_pinv @ images.reshape(images.shape[0], -1)).reshape(res_shape)
print(res.shape)  # (3, 5, 6)

So instead of (3,n) x (n,h,w) we do (3,n) x (n, h*w) -> (3, h*w) which we reshape back to (3, h, w) . Reshaping is free, because this doesn't mean any actual manipulation of data in memory (only a reinterpretation of the single block of memory that underlies the array), and as I said proper matrix products are highly optimized in numpy.


Since you asked for a more intuitive way, here's an alternative making use of numpy.einsum . It will probably be slower, but it's very transparent if you get a little bit used to its notation:

res_einsum = np.einsum('tn,nhw -> thw', S_pinv, images)
print(np.array_equal(res, res_einsum))  # True

This notation names each of the dimensions of the input arrays: for S_pinv the first and second dimensions are named t and n , respectively, and similarly n , h and w for images . The output is set to have dimensions thw which means that any remaining dimensions that are not present in the output shape will be summed along after multiplying the input arrays. This is exactly what you need.


As you noted in a comment, you could also transpose your arrays so that np.dot finds the right dimensions in the right place. But this will also be slow because this might lead to copies in memory, or at least suboptimal looping over your arrays.

I made a quick timing comparison using the following defininitions:

def reshaped(S_pinv, images): 
    res_shape = S_pinv.shape[:1] + images.shape[1:] 
    return (S_pinv @ images.reshape(images.shape[0], -1)).reshape(res_shape)

def einsummed(S_pinv, images): 
    return np.einsum('tn,nhw -> thw', S_pinv, images) 

def transposed(S_pinv, images): 
    return (S_pinv @ images.transpose(2, 0, 1)).transpose(1, 2, 0)          

And here's the timing test using IPython's built-in %timeit magic, and some more realistic array sizes:

>>> S_pinv = np.random.rand(3, 100) 
... images = np.random.rand(100, 200, 300) 
... args = S_pinv, images 
... %timeit reshaped(*args) 
... %timeit einsummed(*args) 
... %timeit transposed(*args)                                          
5.92 ms ± 460 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15.9 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
44.5 ms ± 329 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

answer is np.swapaxes

import numpy as np

q= np.random.random([2, 5,5])
q.shape

w = np.random.random([3,2])
w.shape

w@q

and we have ValueError but

import numpy as np

q= np.random.random([5, 2,5])
q.shape

w = np.random.random([3,2])
w.shape

res = (w@q).swapaxes(0,1)
res.shape # =[3, 5, 5]

One easy way would be np.inner ; inner reduces along the last axis and preserves all others; therefore it is up to a transpose a perfect match:

n,h,w = 10,384,512
images = np.random.randint(1,10,(n,h,w))
S_pinv = np.random.randint(1,10,(n,3))

res_inr = np.inner(images.T,S_pinv.T).T
res_inr.shape
# (3, 384, 512)

Similarly, using transposes matmul actually does the right thing:

res_mml = (images.T@S_pinv).T
assert (res_mml==res_inr).all()

These two seem to be roughly equally fast similar to @AndrasDeak's einsum method.

In particular, they are not as fast as reshaped matmul (Unsurprising, since a single straight matmul must be one of the most optimized operations there is). They are trading in speed for convenience.

This is basically what np.einsum is for.

Instead of:

b = S_pinv @ images

Use

b = np.einsum('ij, ikl -> jkl', S_pinv, images)

in this case i = n_ims , j = 3 , k = h and l = w

Since this is a single contraction, you can also do it with np.tensordot()

b = np.tensordot(S_pinv.T, images, axes = 1)

or,

b = np.tensordot(S_pinv, images, axes = ([0], [0]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM