[英]Why does numpy.dot behave in this way?
I'm trying to understand why numpy's dot
function behaves as it does: 我试图理解为什么numpy的dot
函数的行为会如此:
M = np.ones((9, 9))
V1 = np.ones((9,))
V2 = np.ones((9, 5))
V3 = np.ones((2, 9, 5))
V4 = np.ones((3, 2, 9, 5))
Now np.dot(M, V1)
and np.dot(M, V2)
behave as expected. 现在np.dot(M, V1)
和np.dot(M, V2)
行为符合预期。 But for V3
and V4
the result surprises me: 但是对于V3
和V4
,结果令我惊讶:
>>> np.dot(M, V3).shape
(9, 2, 5)
>>> np.dot(M, V4).shape
(9, 3, 2, 5)
I expected (2, 9, 5)
and (3, 2, 9, 5)
respectively. 我分别期望(2, 9, 5)
和(3, 2, 9, 5)
。 On the other hand, np.matmul
does what I expect: the matrix multiply is broadcast over the first N - 2 dimensions of the second argument and the result has the same shape: 另一方面, np.matmul
了我的期望:矩阵乘法在第二个参数的前N-2维上广播,结果具有相同的形状:
>>> np.matmul(M, V3).shape
(2, 9, 5)
>>> np.matmul(M, V4).shape
(3, 2, 9, 5)
So my question is this: what is the rationale for np.dot
behaving as it does? 所以我的问题是: np.dot
行为原理是什么? Does it serve some particular purpose, or is it the result of applying some general rule? 它是否有某些特定目的,还是应用某些一般规则的结果?
From the docs for np.dot
: 从np.dot
的文档中 :
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). 对于2-D数组,它等效于矩阵乘法,对于1-D数组,其等效于向量的内积(无复数共轭)。 For N dimensions it is a sum product over the last axis of
a
and the second-to-last ofb
: 对于N维它结束的最后一个轴线的和积a
和第二到最后的b
:dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
For np.dot(M, V3)
, 对于np.dot(M, V3)
,
(9,9), (2,9, 5) --> (9, 2, 5)
For np.dot(M, V4)
, 对于np.dot(M, V4)
,
(9,9), (3, 2,9, 5) --> (9, 3, 2, 5)
The strike-through represents dimensions that are summed over, and are therefore not present in the result. 删除线代表相加后的尺寸,因此不存在于结果中。
In contrast, np.matmul
treats N -dimensional arrays as 'stacks' of 2D matrices: 相反, np.matmul
将N维数组视为2D矩阵的“堆栈”:
The behavior depends on the arguments in the following way. 该行为以下列方式取决于参数。
- If both arguments are 2-D they are multiplied like conventional matrices. 如果两个参数都是二维的,它们将像常规矩阵一样相乘。
- If either argument is ND, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly. 如果任一自变量的值为ND,N> 2, 则将其视为驻留在最后两个索引中的一组矩阵,并进行相应广播。
The same reductions are performed in both cases, but the order of the axes is different. 在两种情况下都执行相同的归约,但是轴的顺序不同。 np.matmul
essentially does the equivalent of: np.matmul
本质上等效于:
for ii in range(V3.shape[0]):
out1[ii, :, :] = np.dot(M[:, :], V3[ii, :, :])
and 和
for ii in range(V4.shape[0]):
for jj in range(V4.shape[1]):
out2[ii, jj, :, :] = np.dot(M[:, :], V4[ii, jj, :, :])
From the documentation of numpy.matmul
: 从numpy.matmul
的文档中:
matmul
differs fromdot
in two important ways.matmul
在两个重要方面与dot
不同。
- Multiplication by scalars is not allowed. 标量不能相乘。
- Stacks of matrices are broadcast together as if the matrices were elements. 将矩阵堆栈一起广播,就好像矩阵是元素一样。
In conclusion, this is the standard matrix-matrix multiplication you would expect. 总之,这是您期望的标准矩阵矩阵乘法。
On the other hand, numpy.dot
is only equivalent to the matrix-matrix multiplication for two-dimensional arrays. 另一方面, numpy.dot
仅等效于二维数组的矩阵矩阵乘法。 For larger dimensions, ... 对于更大的尺寸,...
it is a sum product over the last axis of a and the second-to-last of b: 它是a的最后一个轴和b的倒数第二个轴的和积:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
[source: documentation of numpy.dot
] [来源: numpy.dot
文档]
This resembles the inner (dot) product. 这类似于内部(点)乘积。 In case of vectors, numpy.dot
returns the dot product. 对于矢量, numpy.dot
返回点积。 Arrays are considered collections of vectors, and the dot product of them is returned. 数组被视为向量的集合,并返回它们的点积。
For the why : 对于为什么:
dot
and matmult
are both generalizations of 2D*2D matrix multiplication. dot
和matmult
都是2D * 2D矩阵乘法的概括。 But they are a lot of possible choices, according to mathematics properties, broadcasting rules, ... 但是,根据数学属性,广播规则等,它们是很多可能的选择。
The choices are for dot
and matmul
are very different: dot
和matmul
的选择非常不同:
For dot
, some dimensions (green here) are dedicated to the first array , others (blue) for the second. 对于dot
,某些尺寸(此处为绿色)专用于第一个数组,其他尺寸(蓝色)专用于第二个数组。
matmul
need an alignement of stacks regarding to broadcasting rules. matmul
需要关于广播规则的堆栈matmul
。
Numpy is born in an image analysis context, and dot
can manage easily some tasks by a out=dot(image(s),transformation(s))
way. Numpy诞生于图像分析环境中, dot
可以通过out=dot(image(s),transformation(s))
方式轻松管理某些任务。 (see the dot docs in early version of numpy book , p92). (请参阅numpy book的早期版本中的点文档,第92页)。
As an illustration : 举例说明:
from pylab import *
image=imread('stackoverflow.png')
identity=eye(3)
NB=ones((3,3))/3
swap_rg=identity[[1,0,2]]
randoms=[rand(3,3) for _ in range(6)]
transformations=[identity,NB,swap_rg]+randoms
out=dot(image,transformations)
for k in range(9):
subplot(3,3,k+1)
imshow (out[...,k,:])
The modern matmul
can do the same thing as the old dot
, but the stack of matrix must be take in account. 现代的matmul
可以完成与旧dot
相同的操作,但是必须考虑矩阵的堆栈 。 ( matmul(image,transformations[:,None])
here). ( matmul(image,transformations[:,None])
)。
No doubt that it is better in other contexts. 毫无疑问,在其他情况下更好。
The equivalent einsum
expressions are: 等效的einsum
表达式为:
In [92]: np.einsum('ij,kjm->kim',M,V3).shape
Out[92]: (2, 9, 5)
In [93]: np.einsum('ij,lkjm->lkim',M,V4).shape
Out[93]: (3, 2, 9, 5)
Expressed this way, the dot
equivalent, 'ij,lkjm->ilkm', looks just as natural as the 'matmul' equivalent, 'ij,lkjm->lkim'. 以这种方式表示, dot
等效项“ ij,lkjm-> ilkm”看起来与“ matmul”等效项“ ij,lkjm-> lkim”一样自然。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.