为什么 NumPy 矩阵乘法广播在一个方向上工作而不在转置方向上工作？

Question

Consider the following matrix product between two arrays:考虑两个 arrays 之间的以下矩阵乘积：

import numpy as np
A = np.random.rand(2,10,10)                                             
B = np.random.rand(2,2)                                                 
C = A.T @ B

...goes fine. ……一切顺利。 I think of the above as a 1-by-2 times 2-by-2 vector-matrix product broadcast over the 10-by-10 2nd and 3rd dimensions of A. Inspection of the result C confirms this intuition;我认为上面是一个 1×2 乘以 2×2 向量矩阵乘积在 A 的 10×10 第二和第三维上广播。检查结果C证实了这种直觉； np.allclose(C[i,j], AT[i,j] @ B) for all i , j . np.allclose(C[i,j], AT[i,j] @ B)对于所有i ， j 。

Now mathematically, I should be able to compute C.T as well as: BT @ A , but:现在从数学上讲，我应该能够计算C.T以及： BT @ A ，但是：

B.T @ A                                                                
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-ffdbb14ca160> in <module>
----> 1 B.T @ A

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 10 is different from 2)

So broadcast-wise, a 10-by-10-by-2 tensor and a 2-by-2 matrix are compatible with respect to matrix product, but a 2-by-2 matrix and 2-by-10-by-10 tensor are not?所以广播方面，10×10×2 张量和 2×2 矩阵在矩阵乘积方面是兼容的，但是 2×2 矩阵和 2×10×10张量是不是？

Bonus info: I want to be able to compute the "quadratic product" AT @ B @ A and it really annoys me to have to write for-loops to manually "broadcast" over one of the dimensions.额外信息：我希望能够计算AT @ B @ A的“二次积”，而不得不编写 for 循环以在其中一个维度上手动“广播”真的让我很恼火。 It feels like it should be possible to do this more elegantly.感觉应该可以更优雅地做到这一点。 I am pretty experienced with Python and NumPy, but I rarely go beyond two-dimensional arrays.我对 Python 和 NumPy 非常有经验，但我很少 go71D7C5C5Z16D 之外

What am I missing here?我在这里想念什么？ Is there something about the way transpose operates on tensors in NumPy that I do not understand?转置对 NumPy 中张量的操作方式有什么我不明白的吗？

Answer 1

In [194]: A = np.random.rand(2,10,10)                                           
     ...:    
     ...: B = np.random.rand(2,2)                                               
In [196]: A.T.shape                                                             
Out[196]: (10, 10, 2)

In [197]: C = A.T @ B                                                           
In [198]: C.shape                                                               
Out[198]: (10, 10, 2)

The einsum equivalent is: einsum等价物是：

In [199]: np.allclose(np.einsum('ijk,kl->ijl',A.T,B),C)                         
Out[199]: True

or incorporating the transpose into the indexing:或将转置合并到索引中：

In [200]: np.allclose(np.einsum('kji,kl->ijl',A,B),C)                           
Out[200]: True

Note that k is the summed dimension.请注意， k是总维数。 j and l are other dot dimensions. j和l是其他dot尺寸。 i is a kind of 'batch' dimension. i是一种“批量”维度。

Or as you explain np.einsum('k,kl->l', AT[i,j], B)或者你解释np.einsum('k,kl->l', AT[i,j], B)

To get C.T , the einsum result indices should be lji , or lk,jki->lji :要获得C.T ， einsum结果索引应为lji或lk,jki->lji ：

In [201]: np.allclose(np.einsum('lk,jki->lji', B.T, A.transpose(1,0,2)), C.T)      
Out[201]: True

In [226]: np.allclose(np.einsum('ij,jkl->ikl', B.T, A), C.T)                       
Out[226]: True

Matching [201] with @ requires a further transpose:将 [201] 与@匹配需要进一步转置：

In [225]: np.allclose((B.T@(A.transpose(1,0,2))).transpose(1,0,2), C.T)          
Out[225]: True

With einsum when can place the axes in any order, but with matmul , the order is fixed (batch, i, k)@(batch, k, l) -> (batch, i, l) (where the batch dimensions can be broadcast).使用einsum when 可以按任何顺序放置轴，但是使用matmul时，顺序是固定的(batch, i, k)@(batch, k, l) -> (batch, i, l) （其中batch尺寸可以是播送）。

Your example might be easier if A had shape (2,10,9) and B (2,3), with C resulting in (9,10,3)如果A具有形状 (2,10,9) 和B (2,3)，则您的示例可能会更容易， C导致 (9,10,3)

In [229]: A = np.random.rand(2,10,9); B = np.random.rand(2,3)                   
In [230]: C = A.T @ B                                                           
In [231]: C.shape                                                               
Out[231]: (9, 10, 3)
In [232]: C.T.shape                                                             
Out[232]: (3, 10, 9)

In [234]: ((B.T) @ (A.transpose(1,0,2))).shape                                    
Out[234]: (10, 3, 9)
In [235]: ((B.T) @ (A.transpose(1,0,2))).transpose(1,0,2).shape                   
Out[235]: (3, 10, 9)
In [236]: np.allclose(((B.T) @ (A.transpose(1,0,2))).transpose(1,0,2), C.T)        
Out[236]: True

为什么 NumPy 矩阵乘法广播在一个方向上工作而不在转置方向上工作？

问题描述

1 个解决方案

解决方案1
1 2019-10-01 21:39:32

为什么 NumPy 矩阵乘法广播在一个方向上工作而不在转置方向上工作？

问题描述

1 个解决方案

解决方案1 1 2019-10-01 21:39:32

解决方案1
1 2019-10-01 21:39:32