[英]How do numpy's in-place operations (e.g. `+=`) work?
The basic question is: What happens under the hood when doing: a[i] += b
?基本问题是:在执行以下操作时会发生什么: a[i] += b
?
Given the following:鉴于以下情况:
import numpy as np
a = np.arange(4)
i = a > 0
i
= array([False, True, True, True], dtype=bool)
I understand that:我明白那个:
a[i] = x
is the same as a.__setitem__(i, x)
, which assigns directly to the items indicated by i
a[i] = x
与a.__setitem__(i, x)
,直接赋值给i
指示的项a += x
is the same as a.__iadd__(x)
, which does the addition in place a += x
与a.__iadd__(x)
,它在原地进行加法But what happens when I do :但是当我这样做时会发生什么:
a[i] += x
Specifically:具体来说:
a[i] = a[i] + x
?这与a[i] = a[i] + x
吗? (which is not an in-place operation) (这不是就地操作)i
is:如果i
是:
int
index, or一个int
索引,或ndarray
, or一个ndarray
,或slice
object slice
对象Background背景
The reason I started delving into this is that I encountered a non-intuitive behavior when working with duplicate indices:我开始深入研究的原因是我在处理重复索引时遇到了非直观行为:
a = np.zeros(4)
x = np.arange(4)
indices = np.zeros(4,dtype=np.int) # duplicate indices
a[indices] += x
a
= array([ 3., 0., 0., 0.])
More interesting stuff about duplicate indices in this question .在这个问题中关于重复索引的更多有趣的东西。
The first thing you need to realise is that a += x
doesn't map exactly to a.__iadd__(x)
, instead it maps to a = a.__iadd__(x)
.您需要意识到的第一件事是a += x
并不完全映射到a.__iadd__(x)
,而是映射到a = a.__iadd__(x)
。 Notice that the documentation specifically says that in-place operators return their result, and this doesn't have to be self
(although in practice, it usually is).请注意, 文档特别说明就地运算符返回其结果,并且这不必是self
(尽管在实践中,它通常是)。 This means a[i] += x
trivially maps to:这意味着a[i] += x
简单地映射到:
a.__setitem__(i, a.__getitem__(i).__iadd__(x))
So, the addition technically happens in-place, but only on a temporary object.因此,从技术上讲,添加就地发生,但仅限于临时对象。 There is still potentially one less temporary object created than if it called __add__
, though.不过,与调用__add__
,创建的临时对象仍然可能少一个。
Actually that has nothing to do with numpy.其实这与numpy无关。 There is no "set/getitem in-place" in python, these things are equivalent to a[indices] = a[indices] + x
. python中没有“set/getitem in-place”,这些东西等价于a[indices] = a[indices] + x
。 Knowing that, it becomes pretty obvious what is going on.知道了这一点,发生的事情就变得很明显了。 (EDIT: As lvc writes, actually the right hand side is in place, so that it is a[indices] = (a[indices] += x)
if that was legal syntax, that has largly the same effect though) (编辑:正如 lvc 所写,实际上右侧已经到位,因此它是a[indices] = (a[indices] += x)
如果这是合法的语法,但效果大致相同)
Of course a += x
actually is in-place, by mapping a to the np.add
out
argument.当然a += x
实际上是就地的,通过将 a 映射到np.add
out
参数。
It has been discussed before and numpy cannot do anything about it as such.之前已经讨论过,numpy 对此无能为力。 Though there is an idea to have a np.add.at(array, index_expression, x)
to at least allow such operations.虽然有一个想法,让np.add.at(array, index_expression, x)
至少允许这样的操作。
As Ivc explains, there is no in-place item add method, so under the hood it uses __getitem__
, then __iadd__
, then __setitem__
.正如 Ivc 解释的那样,没有就地项目添加方法,所以在__iadd__
它使用__getitem__
,然后是__iadd__
,然后是__setitem__
。 Here's a way to empirically observe that behavior:这是一种凭经验观察该行为的方法:
import numpy
class A(numpy.ndarray):
def __getitem__(self, *args, **kwargs):
print("getitem")
return numpy.ndarray.__getitem__(self, *args, **kwargs)
def __setitem__(self, *args, **kwargs):
print("setitem")
return numpy.ndarray.__setitem__(self, *args, **kwargs)
def __iadd__(self, *args, **kwargs):
print("iadd")
return numpy.ndarray.__iadd__(self, *args, **kwargs)
a = A([1,2,3])
print("about to increment a[0]")
a[0] += 1
It prints它打印
about to increment a[0]
getitem
iadd
setitem
I don't know what's going on under the hood, but in-place operations on items in NumPy arrays and in Python lists will return the same reference, which IMO can lead to confusing results when passed into a function.我不知道幕后发生了什么,但是对 NumPy 数组和 Python 列表中的项目进行的就地操作将返回相同的引用,IMO 在传递给函数时可能会导致混淆结果。
>>> a = [1, 2, 3]
>>> b = a
>>> a is b
True
>>> id(a[2])
12345
>>> id(b[2])
12345
... where 12345
is a unique id
for the location of the value at a[2]
in memory, which is the same as b[2]
. ... 其中12345
是内存中a[2]
处值的位置的唯一id
,与b[2]
相同。
So a
and b
refer to the same list in memory.所以a
和b
指的是内存中的同一个列表。 Now try in-place addition on an item in the list.现在尝试对列表中的项目进行就地添加。
>>> a[2] += 4
>>> a
[1, 2, 7]
>>> b
[1, 2, 7]
>>> a is b
True
>>> id(a[2])
67890
>>> id(b[2])
67890
So in-place addition of the item in the list only changed the value of the item at index 2
, but a
and b
still reference the same list, although the 3rd item in the list was reassigned to a new value, 7
.因此,就地添加列表中的项目仅更改了索引2
处项目的值,但a
和b
仍引用相同的列表,尽管列表中的第 3 项已重新分配给新值7
。 The reassignment explains why if a = 4
and b = a
were integers (or floats) instead of lists, then a += 1
would cause a
to be reassigned, and then b
and a
would be different references.重新赋值解释了为什么如果a = 4
和b = a
是整数(或浮点数)而不是列表,那么a += 1
将导致a
被重新赋值,然后b
和a
将成为不同的引用。 However, if list addition is called, eg : a += [5]
for a
and b
referencing the same list, it does not reassign a
;然而,如果列表添加被调用,例如: a += [5]
为a
和b
引用相同的列表,它不重新分配a
; they will both be appended.它们都将被附加。
>>> import numpy as np
>>> a = np.array([1, 2, 3], float)
>>> b = a
>>> a is b
True
Again these are the same reference, and in-place operators seem have the same effect as for list in Python:同样,这些是相同的引用,就地运算符似乎与 Python 中的 list 具有相同的效果:
>>> a += 4
>>> a
array([ 5., 6., 7.])
>>> b
array([ 5., 6., 7.])
In place addition of an ndarray
updates the reference.代替添加ndarray
更新引用。 This is not the same as calling numpy.add
which creates a copy in a new reference.这与调用numpy.add
,后者在新引用中创建副本。
>>> a = a + 4
>>> a
array([ 9., 10., 11.])
>>> b
array([ 5., 6., 7.])
I think the danger here is if the reference is passed to a different scope.我认为这里的危险是如果引用传递到不同的范围。
>>> def f(x):
... x += 4
... return x
The argument reference to x
is passed into the scope of f
which does not make a copy and in fact changes the value at that reference and passes it back.对x
的参数引用被传递到f
的作用域中,它不进行复制,实际上更改了该引用处的值并将其传回。
>>> f(a)
array([ 13., 14., 15.])
>>> f(a)
array([ 17., 18., 19.])
>>> f(a)
array([ 21., 22., 23.])
>>> f(a)
array([ 25., 26., 27.])
The same would be true for a Python list as well:对于 Python 列表也是如此:
>>> def f(x, y):
... x += [y]
>>> a = [1, 2, 3]
>>> b = a
>>> f(a, 5)
>>> a
[1, 2, 3, 5]
>>> b
[1, 2, 3, 5]
IMO this can be confusing and sometimes difficult to debug, so I try to only use in-place operators on references that belong to the current scope, and I try be careful of borrowed references. IMO 这可能会令人困惑,有时难以调试,所以我尝试只对属于当前范围的引用使用就地运算符,并且我尽量小心借用引用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.