简体   繁体   English

numpy.tensordot 函数是如何逐步工作的?

[英]How does numpy.tensordot function works step-by-step?

I am new to numpy, So I have some problem visualizing the working of the numpy.tensordot() function.我是 numpy 的新手,所以我在可视化numpy.tensordot()函数的工作时numpy.tensordot()一些问题。 According to the documentation of tensordot , the axes are passed in the arguments where axes=0 or 1 represents a normal matrix multiplication whereas axes=2 represents Contraction.根据tensordot的文档,轴在参数中传递,其中轴=0 或 1 表示正常矩阵乘法,而轴=2 表示收缩。

Can somebody please explain on how the multiplication would proceed with the given examples?有人可以解释一下乘法将如何处理给定的例子吗?

Example-1: a=[1,1] b=[2,2] for axes=0,1 why does it throw an error for axes=2?示例 1: a=[1,1] b=[2,2] for axes=0,1为什么它会在 axes=2 时引发错误?
Example-2: a=[[1,1],[1,1]] b=[[2,2],[2,2]] for axes=0,1,2示例 2: a=[[1,1],[1,1]] b=[[2,2],[2,2]] for axes=0,1,2

Edit: The initial focus of this answer was on the case where axes is a tuple, specifying one or more axes for each argument.编辑:此答案的最初重点是axes是元组的情况,为每个参数指定一个或多个轴。 This use allows us to perform variations on the conventional dot , especially for arrays larger than 2d (my answer in the linked question also, https://stackoverflow.com/a/41870980/901925 ).这种用法允许我们对传统的dot进行变体,特别是对于大于 2d 的数组(我在链接问题中的回答也是, https://stackoverflow.com/a/41870980/901925 )。 Axes as scalar is a special case, that gets translated into the tuples version.作为标量的轴是一种特殊情况,它被翻译成元组版本。 So at its core it is still a dot product.所以它的核心仍然是一个dot积。

axes as tuple轴作为元组

In [235]: a=[1,1]; b=[2,2]

a and b are lists; ab是列表; tensordot turns them into arrays. tensordot将它们变成数组。

In [236]: np.tensordot(a,b,(0,0))
Out[236]: array(4)

Since they are both 1d arrays, we specify the axis values as 0.由于它们都是一维数组,我们将轴值指定为 0。

If we try to specify 1:如果我们尝试指定 1:

In [237]: np.tensordot(a,b,(0,1))
---------------------------------------------------------------------------
   1282     else:
   1283         for k in range(na):
-> 1284             if as_[axes_a[k]] != bs[axes_b[k]]:
   1285                 equal = False
   1286                 break

IndexError: tuple index out of range

It is checking whether size of axis 0 of a matches the size of axis 1 of b .它正在检查a的轴 0 的大小是否与b的轴 1 的大小匹配。 But since b is 1d, it can't check that.但由于b是 1d,它无法检查。

In [239]: np.array(a).shape[0]
Out[239]: 2
In [240]: np.array(b).shape[1]
IndexError: tuple index out of range

Your second example is 2d arrays:你的第二个例子是二维数组:

In [242]: a=np.array([[1,1],[1,1]]); b=np.array([[2,2],[2,2]])

Specifying the last axis of a and first of b (second to the last), produces the conventional matrix (dot) product:指定的最后一个轴a和第一b (第二到最后),产生传统的矩阵(点)产物:

In [243]: np.tensordot(a,b,(1,0))
Out[243]: 
array([[4, 4],
       [4, 4]])
In [244]: a.dot(b)
Out[244]: 
array([[4, 4],
       [4, 4]])

Better diagnostic values:更好的诊断值:

In [250]: a=np.array([[1,2],[3,4]]); b=np.array([[2,3],[2,1]])
In [251]: np.tensordot(a,b,(1,0))
Out[251]: 
array([[ 6,  5],
       [14, 13]])
In [252]: np.dot(a,b)
Out[252]: 
array([[ 6,  5],
       [14, 13]])

In [253]: np.tensordot(a,b,(0,1))
Out[253]: 
array([[11,  5],
       [16,  8]])
In [254]: np.dot(b,a)      # same numbers, different layout
Out[254]: 
array([[11, 16],
       [ 5,  8]])
In [255]: np.dot(b,a).T
Out[255]: 
array([[11,  5],
       [16,  8]])

Another pairing:另一个配对:

In [256]: np.tensordot(a,b,(0,0))
In [257]: np.dot(a.T,b)

(0,1,2) for axis is plain wrong. (0,1,2) 轴是完全错误的。 The axis parameter should be 2 numbers, or 2 tuples, corresponding to the 2 arguments.轴参数应该是 2 个数字或 2 个元组,对应于 2 个参数。

The basic processing in tensordot is to transpose and reshape the inputs so it can then pass the results to np.dot for a conventional (last of a, second to the last of b) matrix product. tensordot的基本处理是对输入进行转置和重塑,以便它可以将结果传递给np.dot用于常规(a 的最后一个,b 的最后一个)矩阵乘积。

axes as scalar轴作为标量

If my reading of tensordot code is right, the axes parameter is converted into two lists with:如果我对tensordot代码的阅读是正确的,则axes参数将转换为两个列表:

def foo(axes):
    try:
        iter(axes)
    except Exception:
        axes_a = list(range(-axes, 0))
        axes_b = list(range(0, axes))
    else:
        axes_a, axes_b = axes
    try:
        na = len(axes_a)
        axes_a = list(axes_a)
    except TypeError:
        axes_a = [axes_a]
        na = 1
    try:
        nb = len(axes_b)
        axes_b = list(axes_b)
    except TypeError:
        axes_b = [axes_b]
        nb = 1

    return axes_a, axes_b

For scalar values, 0,1,2 the results are:对于标量值 0,1,2,结果为:

In [281]: foo(0)
Out[281]: ([], [])
In [282]: foo(1)
Out[282]: ([-1], [0])
In [283]: foo(2)
Out[283]: ([-2, -1], [0, 1])

axes=1 is the same as specifying in a tuple: axes=1与在元组中指定相同:

In [284]: foo((-1,0))
Out[284]: ([-1], [0])

And for 2:对于 2:

In [285]: foo(((-2,-1),(0,1)))
Out[285]: ([-2, -1], [0, 1])

With my latest example, axes=2 is the same as specifying a dot over all axes of the 2 arrays:在我的最新示例中, axes=2与在 2 个数组的所有轴上指定一个dot相同:

In [287]: np.tensordot(a,b,axes=2)
Out[287]: array(18)
In [288]: np.tensordot(a,b,axes=((0,1),(0,1)))
Out[288]: array(18)

This is the same as doing dot on the flattened, 1d, views of the arrays:这与在数组的扁平化 1d 视图上做dot相同:

In [289]: np.dot(a.ravel(), b.ravel())
Out[289]: 18

I already demonstrated the conventional dot product for these arrays, the axes=1 case.我已经演示了这些数组的传统点积, axes=1情况。

axes=0 is the same as axes=((),()) , no summation axes for the 2 arrays: axes=0axes=((),()) ,两个数组没有求和轴:

In [292]: foo(((),()))
Out[292]: ([], [])

np.tensordot(a,b,((),())) is the same as np.tensordot(a,b,axes=0) np.tensordot(a,b,((),()))np.tensordot(a,b,axes=0)

It's the -2 in the foo(2) translation that's giving you problems when the input arrays are 1d.当输入数组为 1d 时, foo(2)转换中的-2会给您带来问题。 axes=1 is the 'contraction' for 1d array. axes=1是一维数组的“收缩”。 In other words, don't take the word descriptions in the documentation too literally.换句话说,不要太字面理解文档中的描述。 They just attempt to describe the action of the code;他们只是试图描述代码的动作; they aren't a formal specification.它们不是正式的规范。

einsum equivalents等价物

I think the axes specifications for einsum are clearer and more powerful.我认为einsum的轴规格更清晰、更强大。 Here are the equivalents for 0,1,2这是 0,1,2 的等价物

In [295]: np.einsum('ij,kl',a,b)
Out[295]: 
array([[[[ 2,  3],
         [ 2,  1]],

        [[ 4,  6],
         [ 4,  2]]],


       [[[ 6,  9],
         [ 6,  3]],

        [[ 8, 12],
         [ 8,  4]]]])
In [296]: np.einsum('ij,jk',a,b)
Out[296]: 
array([[ 6,  5],
       [14, 13]])
In [297]: np.einsum('ij,ij',a,b)
Out[297]: 18

The axes=0 case, is equivalent to: axis=0 的情况,相当于:

np.dot(a[:,:,None],b[:,None,:])

It adds a new last axis and new 2nd to last axis, and does a conventional dot product summing over those.它添加了一个新的最后一个轴和新的第二个到最后一个轴,并对它们进行传统的点积求和。 But we usually do this sort of 'outer' multiplication with broadcasting:但是我们通常用广播做这种“外部”乘法:

a[:,:,None,None]*b[None,None,:,:]

While the use of 0,1,2 for axes is interesting, it really doesn't add new calculation power.虽然对轴使用 0,1,2 很有趣,但它实际上并没有增加新的计算能力。 The tuple form of axes is more powerful and useful.轴的元组形式更强大和有用。

code summary (big steps)代码总结(大步骤)

1 - translate axes into axes_a and axes_b as excerpted in the above foo function 1 - 将axes转换为axes_aaxes_b如上述foo函数中摘录的

2 - make a and b into arrays, and get the shape and ndim 2 - 将ab组成数组,并获得形状和 ndim

3 - check for matching size on axes that will be summed (contracted) 3 - 检查将相加的轴上的匹配大小(收缩)

4 - construct a newshape_a and newaxes_a ; 4 - 构造一个newshape_anewaxes_a same for b (complex step) b相同(复杂步骤)

5 - at = a.transpose(newaxes_a).reshape(newshape_a) ; 5 - at = a.transpose(newaxes_a).reshape(newshape_a) ; same for b b

6 - res = dot(at, bt) 6 - res = dot(at, bt)

7 - reshape the res to desired return shape 7 - 将res重塑为所需的返回形状

5 and 6 are the calculation core. 5和6是计算核心。 4 is conceptually the most complex step. 4 是概念上最复杂的步骤。 For all axes values the calculation is the same, a dot product, but the setup varies.对于所有axes值,计算都是相同的, dot积,但设置不同。

beyond 0,1,2超过 0,1,2

While the documentation only mentions 0,1,2 for scalar axes, the code isn't restricted to those values虽然文档只提到了标量轴的 0,1,2,但代码不限于这些值

In [331]: foo(3)
Out[331]: ([-3, -2, -1], [0, 1, 2])

If the inputs are 3, axes=3 should work:如果输入为 3,则轴 = 3 应该可以工作:

In [330]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=3)
Out[330]: array(8.)

or more generally:或更一般地说:

In [325]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=0).shape
Out[325]: (2, 2, 2, 2, 2, 2)
In [326]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=1).shape
Out[326]: (2, 2, 2, 2)
In [327]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=2).shape
Out[327]: (2, 2)
In [328]: np.tensordot(np.ones((2,2,2)), np.ones((2,2,2)), axes=3).shape
Out[328]: ()

and if the inputs are 0d, axes=0 works (axes = 1 does not):如果输入为 0d,则 axes=0 有效(axes = 1 无效):

In [335]: np.tensordot(2,3, axes=0)
Out[335]: array(6)

Can you explain this?你能解释一下吗?

In [363]: np.tensordot(np.ones((4,2,3)),np.ones((2,3,4)),axes=2).shape
Out[363]: (4, 4)

I've played around with other scalar axes values for 3d arrays.我已经尝试过 3d 数组的其他标量轴值。 While it is possible to come up with pairs of shapes that work, the more explicit tuple axes values is easier to work with.虽然可以提出有效的形状对,但更明确的元组轴值更容易使用。 The 0,1,2 options are short cuts that only work for special cases. 0,1,2选项是仅适用于特殊情况的捷径。 The tuple approach is much easier to use - though I still prefer the einsum notation.元组方法更容易使用 - 尽管我仍然更喜欢einsum表示法。

Example 1-0: np.tensordot([1, 1], [2, 2], axes=0)示例 1-0: np.tensordot([1, 1], [2, 2], axes=0)

In this case, a and b both have a single axis and have shape (2,) .在这种情况下, ab都具有单个轴并具有形状(2,)

The axes=0 argument can be translated to ((the last 0 axes of a ), (the first 0 axes of b )), or in this case ((), ()) .所述axes=0参数可以被转换为((最后0轴的一个),(第一个0轴线B的)),或者在这种情况下((), ()) These are the axes that will be contracted.这些是将要收缩的轴。

All the other axes will not be contracted.所有其他轴都不会收缩。 Since each of a and b have a 0-th axis and no others, these are the axes ((0,), (0,)) .由于ab 中的一个都有第 0 个轴而没有其他轴,因此这些轴是((0,), (0,))

The tensordot operation is then as follows (roughly):然后tensordot操作如下(大致):

[
    [x*y for y in b]  # all the non-contraction axes in b
    for x in a        # all the non-contraction axes in a
]

Note that since there are 2 total axes available between a and b and since we're contracting 0 of them, the result has 2 axes.请注意,由于ab之间共有 2 个可用轴,并且由于我们收缩了其中的 0 个,因此结果有 2 个轴。 The shape is (2,2) since those are the shapes of the respective non-contracted axes in a and b (in order).形状是(2,2)因为它们是ab 中各个非收缩轴的形状(按顺序)。

Example 1-1: np.tensordot([1, 1], [2, 2], axes=1)示例 1-1: np.tensordot([1, 1], [2, 2], axes=1)

The axes=1 argument can be translated to ((the last 1 axes of a ), (the first 1 axes of b )), or in this case ((0,), (0,)) .所述axes=1参数可以被转换为((最后1),(在第一1个轴线B的)),或者在这种情况下((0,), (0,)) These are the axes that will be contracted这些是将要收缩的轴

All other axes will not be contracted.所有其他轴都不会收缩。 Since we are already contracting every axis, the remaining axes are ((), ()) .由于我们已经在收缩每个轴,剩下的轴是((), ())

The tensordot operation is then as follows:然后tensordot操作如下:

sum(  # summing over contraction axis
    [x*y for x,y in zip(a, b)]  # contracted axes must line up
)

Note that since we're contracting all axes, the result is a scalar (or a 0-shaped tensor).请注意,由于我们正在收缩所有轴,因此结果是一个标量(或 0 形张量)。 In numpy, you just get a tensor with shape () representing 0 axes rather than an actual scalar.在 numpy 中,您只会得到一个形状为()的张量,表示 0 轴而不是实际的标量。

Example 1-2: np.tensordot([1, 1], [2, 2], axes=2)示例 1-2: np.tensordot([1, 1], [2, 2], axes=2)

The reason this doesn't work is because neither a nor b have two separate axes to contract over.这不起作用的原因是因为ab都没有两个单独的轴可以收缩。

Example 2-1: np.tensordot([[1,1],[1,1]], [[2,2],[2,2]], axes=1)示例 2-1: np.tensordot([[1,1],[1,1]], [[2,2],[2,2]], axes=1)

I'm skipping a couple of your examples since they aren't quite complicated enough to add more clarity than the first few I don't think.我跳过了你的几个例子,因为它们并不复杂到比我认为的前几个更清晰。

In this case, a and b both have two axes available (allowing this problem to be a bit more interesting), and they both have shape (2,2) .在这种情况下, ab都有两个可用的轴(让这个问题更有趣一些),并且它们都有形状(2,2)

The axes=1 argument still represents the last 1 axes of a and the first 1 axes of b , leaving us with ((1,), (0,)) .所述axes=1论点仍然代表最后1轴和b的第一1轴,留给我们((1,), (0,)) These are the axes that will be contracted over.这些是将要收缩的轴。

The remaining axes are not contracted and contribute to the shape of the final solution.其余的轴不收缩,并有助于最终解决方案的形状。 These are ((0,), (1,)) .它们是((0,), (1,))

We can then construct the tensordot operation.然后我们可以构建 tensordot 操作。 For the sake of argument, pretend a and b are numpy arrays so that we can use array properties and make the problem cleaner (eg b=np.array([[2,2],[2,2]]) ).为了便于论证,假设ab是 numpy 数组,以便我们可以使用数组属性并使问题更清晰(例如b=np.array([[2,2],[2,2]]) )。

[
    [
        sum(  # summing the contracted indices
            [x*y for x,y in zip(v,w)]  # axis 1 of a and axis 0 of b must line up for the summation
        )
        for w in b.T  # iterating over axis 1 of b (i.e. the columns)
    ]
    for v in a  # iterating over axis 0 of a (i.e. the rows)
]

The result has shape (a.shape[0], b.shape[1]) since these are the non-contracted axes.结果具有形状(a.shape[0], b.shape[1])因为这些是非收缩轴。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM