简体   繁体   English

如何将二维 numpy 数组与 3D 数组矩阵相乘以得到 3D 数组?

[英]How to matrix-multiply a 2D numpy array with a 3D array to give a 3D array?

I am solving a photometric stereo problem, in which I have "n" number of light sources with 3 channels(Red, Green, Blue) each.我正在解决一个光度立体问题,其中我有“n”个光源,每个光源有 3 个通道(红色、绿色、蓝色)。 Thus light array is of shape nx3: lights.shape = nx3 I have the images corresponding to each lighting condition.因此,灯阵列的形状为 nx3: lights.shape = nx3我有对应于每个照明条件的图像。 image dimensions are hxw (height x width), images.shape = nxhxw图像尺寸为 hxw(高 x 宽), images.shape = nxhxw

I want to matrix multiple each pixel in the image to a matrix of shape 3 xn and get another array of shape 3xhxw these will be the normal vector of each pixel on the image.我想将图像中的每个像素矩阵化为形状为 3 xn 的矩阵,并获得另一个形状为 3xhxw 的数组,这些将是图像上每个像素的法线向量。

shapes of:形状:

  • images: (n_ims, h, w)图像:(n_ims,h,w)
  • lights: (n_ims, 3)灯:(n_ims,3)
S = lights
S_pinv =  np.linalg.inv(S.T@S)@S.T  # pinv is pseudo inverse, S_pinv.shape : (n_ims,3)
b = S_pinv @ images  # I want (3xn @ nxhxw = 3xhxw)

But I am getting this error:但我收到此错误:

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 100 is different from 3) ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 100 is different from 3)

The problem is that numpy views multidimensional arrays as stacks of matrices, and always the last two dimensions are assumed to be the linear space dimensions.问题是 numpy 将多维 arrays 视为矩阵堆栈,并且始终假定最后两个维度是线性空间维度。 This means that the dot product will not work by collapsing the first dimension of your 3d array.这意味着通过折叠 3d 数组的第一个维度,点积将不起作用。

Instead the simplest thing you can do is to reshape your 3d array into a 2d one, doing the matrix multiplication, and reshaping back into a 3d array.相反,您可以做的最简单的事情是将 3d 阵列重塑为二维阵列,进行矩阵乘法,然后重新整形为 3d 阵列。 This will also make use of optimised BLAS code which is one of the great advantages of numpy.这也将利用优化的 BLAS 代码,这是 numpy 的一大优势。

import numpy as np 

S_pinv = np.random.rand(3, 4)
images = np.random.rand(4, 5, 6)

# error: 
# (S_pinv @ images).shape 
res_shape = S_pinv.shape[:1] + images.shape[1:]  # (3, 5, 6) 
res = (S_pinv @ images.reshape(images.shape[0], -1)).reshape(res_shape)
print(res.shape)  # (3, 5, 6)

So instead of (3,n) x (n,h,w) we do (3,n) x (n, h*w) -> (3, h*w) which we reshape back to (3, h, w) .所以我们用(3,n) x (n,h,w)代替 (3,n (3,n) x (n, h*w) -> (3, h*w)将其重新整形为(3, h, w) . Reshaping is free, because this doesn't mean any actual manipulation of data in memory (only a reinterpretation of the single block of memory that underlies the array), and as I said proper matrix products are highly optimized in numpy.重塑是免费的,因为这并不意味着对 memory 中的数据进行任何实际操作(仅重新解释作为数组基础的 memory 的单个块),并且正如我所说,适当的矩阵产品在 Z2EA95ZCB40C37F62F89E1 中得到了高度优化


Since you asked for a more intuitive way, here's an alternative making use of numpy.einsum .既然您要求更直观的方式,这里是使用numpy.einsum的替代方法。 It will probably be slower, but it's very transparent if you get a little bit used to its notation:它可能会更慢,但如果你稍微习惯它的符号,它就会非常透明:

res_einsum = np.einsum('tn,nhw -> thw', S_pinv, images)
print(np.array_equal(res, res_einsum))  # True

This notation names each of the dimensions of the input arrays: for S_pinv the first and second dimensions are named t and n , respectively, and similarly n , h and w for images .这个符号命名了输入 arrays 的每个维度:对于S_pinv ,第一和第二维度分别命名为tn ,类似地,对于imagesnhw The output is set to have dimensions thw which means that any remaining dimensions that are not present in the output shape will be summed along after multiplying the input arrays. output 的尺寸设置为thw ,这意味着 output 形状中不存在的任何剩余尺寸将在与输入 arrays 相乘后相加。 This is exactly what you need.这正是您所需要的。


As you noted in a comment, you could also transpose your arrays so that np.dot finds the right dimensions in the right place.正如您在评论中指出的那样,您还可以转置您的 arrays 以便np.dot在正确的位置找到正确的尺寸。 But this will also be slow because this might lead to copies in memory, or at least suboptimal looping over your arrays.但这也会很慢,因为这可能会导致 memory 中的副本,或者至少在 arrays 上循环不理想。

I made a quick timing comparison using the following defininitions:我使用以下定义进行了快速时间比较:

def reshaped(S_pinv, images): 
    res_shape = S_pinv.shape[:1] + images.shape[1:] 
    return (S_pinv @ images.reshape(images.shape[0], -1)).reshape(res_shape)

def einsummed(S_pinv, images): 
    return np.einsum('tn,nhw -> thw', S_pinv, images) 

def transposed(S_pinv, images): 
    return (S_pinv @ images.transpose(2, 0, 1)).transpose(1, 2, 0)          

And here's the timing test using IPython's built-in %timeit magic, and some more realistic array sizes:这是使用 IPython 内置的%timeit魔法和一些更现实的数组大小的计时测试:

>>> S_pinv = np.random.rand(3, 100) 
... images = np.random.rand(100, 200, 300) 
... args = S_pinv, images 
... %timeit reshaped(*args) 
... %timeit einsummed(*args) 
... %timeit transposed(*args)                                          
5.92 ms ± 460 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15.9 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
44.5 ms ± 329 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

answer is np.swapaxes答案是np.swapaxes

import numpy as np

q= np.random.random([2, 5,5])
q.shape

w = np.random.random([3,2])
w.shape

w@q

and we have ValueError but我们有ValueError但是

import numpy as np

q= np.random.random([5, 2,5])
q.shape

w = np.random.random([3,2])
w.shape

res = (w@q).swapaxes(0,1)
res.shape # =[3, 5, 5]

One easy way would be np.inner ;一种简单的方法是np.inner inner reduces along the last axis and preserves all others; inner沿最后一个轴减少并保留所有其他轴; therefore it is up to a transpose a perfect match:因此,这取决于转置完美匹配:

n,h,w = 10,384,512
images = np.random.randint(1,10,(n,h,w))
S_pinv = np.random.randint(1,10,(n,3))

res_inr = np.inner(images.T,S_pinv.T).T
res_inr.shape
# (3, 384, 512)

Similarly, using transposes matmul actually does the right thing:同样,使用转置matmul实际上做了正确的事情:

res_mml = (images.T@S_pinv).T
assert (res_mml==res_inr).all()

These two seem to be roughly equally fast similar to @AndrasDeak's einsum method.这两个似乎与@AndrasDeak 的einsum方法大致相同。

In particular, they are not as fast as reshaped matmul (Unsurprising, since a single straight matmul must be one of the most optimized operations there is).特别是,它们不如重塑 matmul 快(不足为奇,因为单个直接 matmul 必须是最优化的操作之一)。 They are trading in speed for convenience.他们为了方便而以速度进行交易。

This is basically what np.einsum is for.这基本上就是np.einsum的用途。

Instead of:代替:

b = S_pinv @ images

Use利用

b = np.einsum('ij, ikl -> jkl', S_pinv, images)

in this case i = n_ims , j = 3 , k = h and l = w在这种情况下i = n_imsj = 3k = hl = w

Since this is a single contraction, you can also do it with np.tensordot()由于这是一个单一的收缩,你也可以用np.tensordot()

b = np.tensordot(S_pinv.T, images, axes = 1)

or,或者,

b = np.tensordot(S_pinv, images, axes = ([0], [0]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM