简体   繁体   English

numpy 的就地操作(例如`+=`)如何工作?

[英]How do numpy's in-place operations (e.g. `+=`) work?

The basic question is: What happens under the hood when doing: a[i] += b ?基本问题是:在执行以下操作时会发生什么: a[i] += b

Given the following:鉴于以下情况:

import numpy as np
a = np.arange(4)
i = a > 0
i
= array([False,  True,  True,  True], dtype=bool)

I understand that:我明白那个:

  • a[i] = x is the same as a.__setitem__(i, x) , which assigns directly to the items indicated by i a[i] = xa.__setitem__(i, x) ,直接赋值给i指示的项
  • a += x is the same as a.__iadd__(x) , which does the addition in place a += xa.__iadd__(x) ,它在原地进行加法

But what happens when I do :但是当我这样做时会发生什么

a[i] += x

Specifically:具体来说:

  1. Is this the same as a[i] = a[i] + x ?这与a[i] = a[i] + x吗? (which is not an in-place operation) (这不是就地操作)
  2. Does it make a difference in this case if i is:如果i是:
    • an int index, or一个int索引,或
    • an ndarray , or一个ndarray ,或
    • a slice object slice对象

Background背景

The reason I started delving into this is that I encountered a non-intuitive behavior when working with duplicate indices:我开始深入研究的原因是我在处理重复索引时遇到了非直观行为:

a = np.zeros(4)
x = np.arange(4)
indices = np.zeros(4,dtype=np.int)  # duplicate indices
a[indices] += x
a
= array([ 3.,  0.,  0.,  0.])

More interesting stuff about duplicate indices in this question .这个问题中关于重复索引的更多有趣的东西。

The first thing you need to realise is that a += x doesn't map exactly to a.__iadd__(x) , instead it maps to a = a.__iadd__(x) .您需要意识到的第一件事是a += x并不完全映射到a.__iadd__(x) ,而是映射到a = a.__iadd__(x) Notice that the documentation specifically says that in-place operators return their result, and this doesn't have to be self (although in practice, it usually is).请注意, 文档特别说明就地运算符返回其结果,并且这不必是self (尽管在实践中,它通常是)。 This means a[i] += x trivially maps to:这意味着a[i] += x简单地映射到:

a.__setitem__(i, a.__getitem__(i).__iadd__(x))

So, the addition technically happens in-place, but only on a temporary object.因此,从技术上讲,添加就地发生,但仅限于临时对象。 There is still potentially one less temporary object created than if it called __add__ , though.不过,与调用__add__ ,创建的临时对象仍然可能少一个。

Actually that has nothing to do with numpy.其实这与numpy无关。 There is no "set/getitem in-place" in python, these things are equivalent to a[indices] = a[indices] + x . python中没有“set/getitem in-place”,这些东西等价于a[indices] = a[indices] + x Knowing that, it becomes pretty obvious what is going on.知道了这一点,发生的事情就变得很明显了。 (EDIT: As lvc writes, actually the right hand side is in place, so that it is a[indices] = (a[indices] += x) if that was legal syntax, that has largly the same effect though) (编辑:正如 lvc 所写,实际上右侧已经到位,因此它是a[indices] = (a[indices] += x)如果这是合法的语法,但效果大致相同)

Of course a += x actually is in-place, by mapping a to the np.add out argument.当然a += x实际上是就地的,通过将 a 映射到np.add out参数。

It has been discussed before and numpy cannot do anything about it as such.之前已经讨论过,numpy 对此无能为力。 Though there is an idea to have a np.add.at(array, index_expression, x) to at least allow such operations.虽然有一个想法,让np.add.at(array, index_expression, x)至少允许这样的操作。

As Ivc explains, there is no in-place item add method, so under the hood it uses __getitem__ , then __iadd__ , then __setitem__ .正如 Ivc 解释的那样,没有就地项目添加方法,所以在__iadd__它使用__getitem__ ,然后是__iadd__ ,然后是__setitem__ Here's a way to empirically observe that behavior:这是一种凭经验观察该行为的方法:

import numpy

class A(numpy.ndarray):
    def __getitem__(self, *args, **kwargs):
        print("getitem")
        return numpy.ndarray.__getitem__(self, *args, **kwargs)
    def __setitem__(self, *args, **kwargs):
        print("setitem")
        return numpy.ndarray.__setitem__(self, *args, **kwargs)
    def __iadd__(self, *args, **kwargs):
        print("iadd")
        return numpy.ndarray.__iadd__(self, *args, **kwargs)

a = A([1,2,3])
print("about to increment a[0]")
a[0] += 1

It prints它打印

about to increment a[0]
getitem
iadd
setitem

I don't know what's going on under the hood, but in-place operations on items in NumPy arrays and in Python lists will return the same reference, which IMO can lead to confusing results when passed into a function.我不知道幕后发生了什么,但是对 NumPy 数组和 Python 列表中的项目进行的就地操作将返回相同的引用,IMO 在传递给函数时可能会导致混淆结果。

Start with Python从 Python 开始

>>> a = [1, 2, 3]
>>> b = a
>>> a is b
True
>>> id(a[2])
12345
>>> id(b[2])
12345

... where 12345 is a unique id for the location of the value at a[2] in memory, which is the same as b[2] . ... 其中12345是内存中a[2]处值的位置的唯一id ,与b[2]相同。

So a and b refer to the same list in memory.所以ab指的是内存中的同一个列表。 Now try in-place addition on an item in the list.现在尝试对列表中的项目进行就地添加。

>>> a[2] += 4
>>> a
[1, 2, 7]
>>> b
[1, 2, 7]
>>> a is b
True
>>> id(a[2])
67890
>>> id(b[2])
67890

So in-place addition of the item in the list only changed the value of the item at index 2 , but a and b still reference the same list, although the 3rd item in the list was reassigned to a new value, 7 .因此,就地添加列表中的项目仅更改了索引2处项目的值,但ab仍引用相同的列表,尽管列表中的第 3 项已重新分配给新值7 The reassignment explains why if a = 4 and b = a were integers (or floats) instead of lists, then a += 1 would cause a to be reassigned, and then b and a would be different references.重新赋值解释了为什么如果a = 4b = a是整数(或浮点数)而不是列表,那么a += 1将导致a被重新赋值,然后ba将成为不同的引用。 However, if list addition is called, eg : a += [5] for a and b referencing the same list, it does not reassign a ;然而,如果列表添加被调用,例如a += [5]ab引用相同的列表,它重新分配a ; they will both be appended.它们都将被附加。

Now for NumPy现在是 NumPy

>>> import numpy as np
>>> a = np.array([1, 2, 3], float)
>>> b = a
>>> a is b
True

Again these are the same reference, and in-place operators seem have the same effect as for list in Python:同样,这些是相同的引用,就地运算符似乎与 Python 中的 list 具有相同的效果:

>>> a += 4
>>> a
array([ 5.,  6.,  7.])
>>> b
array([ 5.,  6.,  7.])

In place addition of an ndarray updates the reference.代替添加ndarray更新引用。 This is not the same as calling numpy.add which creates a copy in a new reference.这与调用numpy.add ,后者在新引用中创建副本。

>>> a = a + 4
>>> a
array([  9.,  10.,  11.])
>>> b
array([ 5.,  6.,  7.])

In-place operations on borrowed references借用引用的就地操作

I think the danger here is if the reference is passed to a different scope.我认为这里的危险是如果引用传递到不同的范围。

>>> def f(x):
...     x += 4
...     return x

The argument reference to x is passed into the scope of f which does not make a copy and in fact changes the value at that reference and passes it back.x的参数引用被传递到f的作用域中,它不进行复制,实际上更改了该引用处的值并将其传回。

>>> f(a)
array([ 13.,  14.,  15.])
>>> f(a)
array([ 17.,  18.,  19.])
>>> f(a)
array([ 21.,  22.,  23.])
>>> f(a)
array([ 25.,  26.,  27.])

The same would be true for a Python list as well:对于 Python 列表也是如此:

>>> def f(x, y):
...     x += [y]

>>> a = [1, 2, 3]
>>> b = a
>>> f(a, 5)
>>> a
[1, 2, 3, 5]
>>> b
[1, 2, 3, 5]

IMO this can be confusing and sometimes difficult to debug, so I try to only use in-place operators on references that belong to the current scope, and I try be careful of borrowed references. IMO 这可能会令人困惑,有时难以调试,所以我尝试只对属于当前范围的引用使用就地运算符,并且我尽量小心借用引用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 带有重叠切片的NumPy就地操作 - NumPy in-place operations with overlapping slices 你如何在 Python 中用 numpy 做自然日志(例如“ln()”)? - How do you do natural logs (e.g. “ln()”) with numpy in Python? 如何将 numpy 布尔值数组转换为 python 布尔值以进行序列化(例如用于 mongodb)? - How do I convert an array of numpy booleans to python booleans for serialization (e.g. for mongodb)? 你如何获得 python 包的 pydoc 可访问路径列表,例如 numpy 或 tensorflow? - How do you get a list of pydoc accessible paths for a python package, e.g. numpy or tensorflow? 如何对一组特定的numpy数组行进行就地处理 - How to do in-place processing of a specific set of numpy array rows 如何为ARM交叉编译python包(例如Numpy) - how to cross compile python packages (e.g. Numpy) for ARM 将 numpy 数组设置为切片,无需任何就地操作 - Setting numpy array to slice without any in-place operations 非常大的就地 numpy 数组操作:numba、pythran 还是其他? - Very large in-place numpy array operations : numba, pythran or other? 使用 PyTorch 进行就地操作 - In-place operations with PyTorch 在 Drake 中,如何将 NumPy 数组转换为不同的标量类型? (例如从 float 到 AutoDiffXd 或 Expression?) - In Drake, how do I convert an NumPy array to different scalar types? (e.g. from float to AutoDiffXd or Expression?)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM