为什么numpy.dot会以这种方式表现？

Question

I'm trying to understand why numpy's dot function behaves as it does: 我试图理解为什么numpy的dot函数的行为会如此：

M = np.ones((9, 9))
V1 = np.ones((9,))
V2 = np.ones((9, 5))
V3 = np.ones((2, 9, 5))
V4 = np.ones((3, 2, 9, 5))

Now np.dot(M, V1) and np.dot(M, V2) behave as expected. 现在np.dot(M, V1)和np.dot(M, V2)行为符合预期。 But for V3 and V4 the result surprises me: 但是对于V3和V4 ，结果令我惊讶：

>>> np.dot(M, V3).shape
(9, 2, 5)
>>> np.dot(M, V4).shape
(9, 3, 2, 5)

I expected (2, 9, 5) and (3, 2, 9, 5) respectively. 我分别期望(2, 9, 5)和(3, 2, 9, 5) 。 On the other hand, np.matmul does what I expect: the matrix multiply is broadcast over the first N - 2 dimensions of the second argument and the result has the same shape: 另一方面， np.matmul了我的期望：矩阵乘法在第二个参数的前N-2维上广播，结果具有相同的形状：

>>> np.matmul(M, V3).shape
(2, 9, 5)
>>> np.matmul(M, V4).shape
(3, 2, 9, 5)

So my question is this: what is the rationale for np.dot behaving as it does? 所以我的问题是： np.dot行为原理是什么？ Does it serve some particular purpose, or is it the result of applying some general rule? 它是否有某些特定目的，还是应用某些一般规则的结果？

Answer 1

From the docs for np.dot : 从np.dot的文档中：

For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). 对于2-D数组，它等效于矩阵乘法，对于1-D数组，其等效于向量的内积（无复数共轭）。 For N dimensions it is a sum product over the last axis of a and the second-to-last of b : 对于N维它结束的最后一个轴线的和积 a和第二到最后的b ：
 dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m]) 

For np.dot(M, V3) , 对于np.dot(M, V3) ，

(9, 9), (2, 9, 5) --> (9, 2, 5)

For np.dot(M, V4) , 对于np.dot(M, V4) ，

(9, 9), (3, 2, 9, 5) --> (9, 3, 2, 5)

The strike-through represents dimensions that are summed over, and are therefore not present in the result. 删除线代表相加后的尺寸，因此不存在于结果中。

In contrast, np.matmul treats N -dimensional arrays as 'stacks' of 2D matrices: 相反， np.matmul将N维数组视为2D矩阵的“堆栈”：

The behavior depends on the arguments in the following way. 该行为以下列方式取决于参数。

If both arguments are 2-D they are multiplied like conventional matrices. 如果两个参数都是二维的，它们将像常规矩阵一样相乘。

If either argument is ND, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly. 如果任一自变量的值为ND，N> 2， 则将其视为驻留在最后两个索引中的一组矩阵，并进行相应广播。

The same reductions are performed in both cases, but the order of the axes is different. 在两种情况下都执行相同的归约，但是轴的顺序不同。 np.matmul essentially does the equivalent of: np.matmul本质上等效于：

for ii in range(V3.shape[0]):
    out1[ii, :, :] = np.dot(M[:, :], V3[ii, :, :])

and 和

for ii in range(V4.shape[0]):
    for jj in range(V4.shape[1]):
        out2[ii, jj, :, :] = np.dot(M[:, :], V4[ii, jj, :, :])

Answer 2

From the documentation of numpy.matmul : 从numpy.matmul的文档中：

matmul differs from dot in two important ways. matmul在两个重要方面与dot不同。

Multiplication by scalars is not allowed. 标量不能相乘。

Stacks of matrices are broadcast together as if the matrices were elements. 将矩阵堆栈一起广播，就好像矩阵是元素一样。

In conclusion, this is the standard matrix-matrix multiplication you would expect. 总之，这是您期望的标准矩阵矩阵乘法。

On the other hand, numpy.dot is only equivalent to the matrix-matrix multiplication for two-dimensional arrays. 另一方面， numpy.dot仅等效于二维数组的矩阵矩阵乘法。 For larger dimensions, ... 对于更大的尺寸，...

it is a sum product over the last axis of a and the second-to-last of b: 它是a的最后一个轴和b的倒数第二个轴的和积：
 dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m]) 

[source: documentation of numpy.dot ] [来源： numpy.dot文档]

This resembles the inner (dot) product. 这类似于内部（点）乘积。 In case of vectors, numpy.dot returns the dot product. 对于矢量， numpy.dot返回点积。 Arrays are considered collections of vectors, and the dot product of them is returned. 数组被视为向量的集合，并返回它们的点积。

Answer 3

For the why : 对于为什么：

dot and matmult are both generalizations of 2D*2D matrix multiplication. dot和matmult都是2D * 2D矩阵乘法的概括。 But they are a lot of possible choices, according to mathematics properties, broadcasting rules, ... 但是，根据数学属性，广播规则等，它们是很多可能的选择。

The choices are for dot and matmul are very different: dot和matmul的选择非常不同：

For dot , some dimensions (green here) are dedicated to the first array , others (blue) for the second. 对于dot ，某些尺寸（此处为绿色）专用于第一个数组，其他尺寸（蓝色）专用于第二个数组。

matmul need an alignement of stacks regarding to broadcasting rules. matmul需要关于广播规则的堆栈matmul 。

Numpy is born in an image analysis context, and dot can manage easily some tasks by a out=dot(image(s),transformation(s)) way. Numpy诞生于图像分析环境中， dot可以通过out=dot(image(s),transformation(s))方式轻松管理某些任务。 (see the dot docs in early version of numpy book , p92). （请参阅numpy book的早期版本中的点文档，第92页）。

As an illustration : 举例说明：

from pylab import *
image=imread('stackoverflow.png')

identity=eye(3)
NB=ones((3,3))/3
swap_rg=identity[[1,0,2]]
randoms=[rand(3,3) for _ in range(6)]

transformations=[identity,NB,swap_rg]+randoms
out=dot(image,transformations)

for k in range(9): 
    subplot(3,3,k+1)
    imshow (out[...,k,:])

The modern matmul can do the same thing as the old dot , but the stack of matrix must be take in account. 现代的matmul可以完成与旧dot相同的操作，但是必须考虑矩阵的堆栈。 ( matmul(image,transformations[:,None]) here). （ matmul(image,transformations[:,None]) ）。

No doubt that it is better in other contexts. 毫无疑问，在其他情况下更好。

Answer 4

The equivalent einsum expressions are: 等效的einsum表达式为：

In [92]: np.einsum('ij,kjm->kim',M,V3).shape
Out[92]: (2, 9, 5)
In [93]: np.einsum('ij,lkjm->lkim',M,V4).shape
Out[93]: (3, 2, 9, 5)

Expressed this way, the dot equivalent, 'ij,lkjm->ilkm', looks just as natural as the 'matmul' equivalent, 'ij,lkjm->lkim'. 以这种方式表示， dot等效项“ ij，lkjm-> ilkm”看起来与“ matmul”等效项“ ij，lkjm-> lkim”一样自然。

为什么numpy.dot会以这种方式表现？

问题描述

4 个解决方案

解决方案1
7 2015-11-29 12:13:58

解决方案2
4 2015-11-29 12:05:27

解决方案3
2 2016-03-16 23:38:01

解决方案4
1 2015-11-29 22:16:16

为什么numpy.dot会以这种方式表现？

问题描述

4 个解决方案

解决方案1 7 2015-11-29 12:13:58

解决方案2 4 2015-11-29 12:05:27

解决方案3 2 2016-03-16 23:38:01

解决方案4 1 2015-11-29 22:16:16

解决方案1
7 2015-11-29 12:13:58

解决方案2
4 2015-11-29 12:05:27

解决方案3
2 2016-03-16 23:38:01

解决方案4
1 2015-11-29 22:16:16