简体   繁体   English

np.ndarray`中的奇怪行为“是”

[英]Strange behavior in np.ndarray` “is”

"is" built-in operator shows a strange behavior for the element in np.ndarray . “是”内置运算符显示np.ndarray元素的奇怪行为。

Although the id of the rhs and the lhs is the same, the "is" operator returns False (this behavior is specific to np.ndarray ). 虽然rhs和lhs的id相同,但“is”运算符返回False(此行为特定于np.ndarray )。

a = np.array([1.,])
b = a.view()
print(id(a[0] == id(b[0])))  # True
print(a[0] is b[0])  # False

This strange behavior even happens without the copy of view. 这种奇怪的行为即使没有视图副本也会发生。

a = np.array([1.,])
print(a[0] is a[0])  # False

Does anyone know the mechanism of this strange behavior (and possibly the evidence or specification)? 有谁知道这种奇怪行为的机制(可能还有证据或规范)?

Post Script: Please re-think the two examples. 后脚本:请重新考虑两个例子。

  1. If this is a list, this phenomenon is not observed. 如果这是一个列表,则不会观察到这种现象。
a = [0., 1., 2.,]
b = []
b.append(a[0])
print(a[0] is b[0])  # True
  1. a[0] and b[0] refer the exact same object. a [0]和b [0]指的是完全相同的对象。
a = np.array([1.,])
b = a.view()
b[0] = 0.
print(a[0])  # 0.0
print(id(a[0]) == id(b[0]))  # True

Note: This question can be a duplication, but I'm still a bit confused. 注意:这个问题可能是重复的,但我仍然有点困惑。

a = np.array([1.,])
b = a.view()
x = a[0]
y = b[0]
print(id(a[0]))  # 139746064667728
print(id(b[0]))  # 139746064667728
print(id(a[0]) == id(b[0])) # True
print(id(a[0]) == id(x)) # False
print(id(x) == id(y))  # False
  1. Is a[0] a temporal object? [0]是一个时间对象吗?
  2. Is the id for a temporal object reused? 是否重用了临时对象的id?
  3. Doesn't it contradict to the specification? 它不符合规范吗? ( https://docs.python.org/3.7/reference/expressions.html#is ) https://docs.python.org/3.7/reference/expressions.html#is
6.10.3. Identity comparisons
The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. Object identity is determined using the id() function. x is not y yields the inverse truth value.
  1. If the id is re-used for the temporal objects, why in this case the id is different? 如果id被重用于临时对象,为什么在这种情况下id是不同的?
>>> id(100000000000000000 + 1) == id(100000000000000001)
True
>>> id(100000000000000000 + 1) == id(100000000000000000)
False

This is simply due to the difference in how the is and == works , the is operator doesn't compare the values they simply check if the two operands refer to the same object or not. 这只是由于is和==的工作原理不同,is运算符不会比较它们只检查两个操作数是否引用同一个对象的值。

For example if you do: 例如,如果你这样做:

print(a is a)

The output will be: True for more information look up here 输出将是:真正以获取更多信息查找这里

When python compares it allocates different positions to the operands and the same behaviour can be observed with a simple test using an id function. 当python比较时,它会为操作数分配不同的位置,并且可以使用id函数通过简单测试观察到相同的行为。

print(id(a[0]),a[0] is a[0],id(a[0]))

The output will be: 输出将是:

140296834593128 False 140296834593248

The answer to the question that you are asking in addition that why lists don't behave the way numpy arrays behave is simply based on their construction. 你要问的问题的答案是,为什么列表的行为与numpy数组的行为方式不同,只是基于它们的结构。 Np.arrays were designed to be more efficient in their processing capabilities and more efficient in their storage than a normal python list. 与普通的python列表相比,Np.arrays的处理能力更高,存储效率更高。

So every-time you load or perform an operation on a numpy array it is loaded and assigned a different id as you can observe from the following code: 因此,每次在numpy数组上加载或执行操作时,都会加载并分配一个不同的id,如下面的代码所示:

a = np.array([0., 1., 2.,])
b = []
b.append(a[0])
print(id(a[0]),a[0] is b[0],id(b[0]))

Here are the outputs of multiple re-runs of the same code in jupyter-lab: 以下是jupyter-lab中相同代码的多次重新运行的输出:

140296834595096 False 140296834594496
140296834595120 False 140296834594496
140296834595120 False 140296834594496
140296834595216 False 140296834594496
140296834595288 False 140296834594496

Notice something strange?, The ids of the numpy array with each re-run is different however the id for the list object remains the same. 注意一些奇怪的东西?每次重新运行的numpy数组的id都不同,但list对象的id保持不变。 This explains the strange behaviour for numpy arrays in your question. 这解释了你问题中numpy数组的奇怪行为。

If you want to read more on this behaviour I will suggest numpy docs 如果您想了解更多有关此行为的信息,我会建议使用numpy docs

a[0] is of type <class 'numpy.float64'> . a[0]的类型为<class 'numpy.float64'> When you do the comparison it crates two instances of the class, so the is check fails. 当你进行比较时,它会创建该类的两个实例,因此检查失败。 However if you do the following you will get what you wanted, because now both are referencing the same object. 但是,如果您执行以下操作,您将获得所需的内容,因为现在两者都引用相同的对象。

x = a[0]
print(x is x)  # True

This is covered by id() vs `is` operator. 这由id()vs`is`运算符覆盖 Is it safe to compare `id`s? 比较`id`s是否安全? Does the same `id` mean the same object? 相同的`id`是否意味着相同的对象? . In this particular case: 在这种特殊情况下:

  1. a[0] and b[0] are created anew each time 每次重新创建a[0]b[0]

     In [7]: a[0] is a[0] Out[7]: False 
  2. In id(a[0]) == id(b[0]) , each object is immediately discarded after taking its id , and b[0] just happened to take up the id of the recently-discarded a[0] . id(a[0]) == id(b[0]) ,每个对象在获取其id后立即被丢弃,而b[0]恰好占用了最近丢弃的a[0]id Even if this happens each time in your version of CPython for this particular expression (due to a specific evaluation order and heap organization), this is an implementation detail and you can't rely on it. 即使每次在您的CPython版本中都出现这种特定表达式(由于特定的评估顺序和堆组织),这也是一个实现细节,您不能依赖它。

Numpy stores array data as a raw data buffer . Numpy将数组数据存储为原始数据缓冲区 When you access the data like a[0] , it reads from the buffer and constructs a python object for it. 当您像a[0]一样访问数据时,它会从缓冲区中读取并为其构造一个python对象。 Thus, calling a[0] twice will construct 2 python objects. 因此,两次调用a[0]将构造2个python对象。 is checks for identity, so 2 different objects will compare false . is检查身份,所以2个不同的对象将比较false

This illustration should make the process much clearer: 这个例子应该使过程更加清晰:

NOTE: id numbers are sequential to be used simply as examples. 注意:id号是顺序的,仅用作示例。 clearly you'd get a random like number. 显然你会得到一个随机的号码。 The multiple id 3s in the example also may not necessarily always be the same number. 示例中的多个id 3也可能不一定总是相同的数字。 It's just possible that they are, because id 3 is repeatedly freed and thus reusable. 它们可能就是这样,因为id 3被重复释放并因此可以重复使用。

a = np.array([1.,])
b = a.view()
x = a[0]    # python reads a[0], creates new object id 1.
y = b[0]    # python reads b[0] which reads a[0], creates new object id 2. (1 is used by object x)

print(id(a[0]))  # python reads a[0], creates new object id 3.
                 # After this call, the object id 3 a[0] is no longer used.
                 # Its lifetime has ended and id 3 is freed.

print(id(b[0]))  # python reads b[0] which reads a[0], creates new object id 3. 
                 # id 3 has been freed and is reusable.
                 # After this call, the object id 3 b[0] is no longer used.
                 # Its lifetime has ended and id 3 is freed (again).

print(id(a[0]) == id(b[0])) # This runs in 2 steps.
                            # First id(a[0]) is run. This is just like above, creates a object with id 3.
                            # Then a[0] is disposed of since no references are created to it. id 3 is freed again.
                            # Then id(b[0]) is run. Again, it creates a object with id 3. (Since id 3 is free).
                            # So, id(a[0]) == 3, id(b[0]) == 3. They are equal.

print(id(a[0]) == id(x)) # Following the same thing above, id(a[0]) can create a object of id 3, x maintains its reference to id 1 object. 3 != 1.

print(id(x) == id(y))  # x references id 1 object, y references id 2 object. 1 != 2

Regarding 关于

>>> id(100000000000000000 + 1) == id(100000000000000001)
True
>>> id(100000000000000000 + 1) == id(100000000000000000)
False

id allocation, and garbage collection are implementation details. id分配和垃圾收集是实现细节。 What is guaranteed, is that, at a single point in time , references to 2 different objects are different and references to 2 identical objects are the same. 保证的是, 在单个时间点 ,对2个不同对象的引用是不同的,对2个相同对象的引用是相同的。 The problem is that some expressions may not be atomic (ie not run at a single point in time). 问题是某些表达式可能不是原子的(即不在单个时间点运行)。

Python may decide to reuse or not to reuse freed id numbers as it wishes, depending on the implementation. 根据实现,Python可能决定重用或不重用自由的id号。 In this case, it decided to reuse in one case and not in the other. 在这种情况下,它决定在一种情况下重复使用而不在另一种情况下重复使用。 (it's likely that in the id(100000000000000000 + 1) == id(100000000000000001) python realises that since the number is the same, it can reuse it efficiently because 100000000000000001 would be in the same location in memory.) (很可能在id(100000000000000000 + 1) == id(100000000000000001) python中实现了由于数字相同,它可以有效地重用它,因为100000000000000001将位于内存中的相同位置。)

A big part of the confusion here is the nature of a[0] in the case of an array. 这里混淆的很大一部分是在数组的情况下a[0]的本质。

For a list, b[0] is an actual element of b . 对于列表, b[0]b的实际元素。 We can illustrate this by making a list of mutable items (other lists): 我们可以通过列出可变项(其他列表)来说明这一点:

In [22]: b = [[0],[1],[2],[3]]
In [23]: b1 = b[0]
In [24]: b1
Out[24]: [0]
In [25]: b[0].append(10)
In [26]: b
Out[26]: [[0, 10], [1], [2], [3]]
In [27]: b1
Out[27]: [0, 10]
In [28]: b1.append(20)
In [29]: b
Out[29]: [[0, 10, 20], [1], [2], [3]]

Mutating b[0] and b1 act on the same object. 变异b[0]b1作用于同一物体。

For an array: 对于数组:

In [35]: a = np.array([0,1,2,3])
In [36]: c = a.view()
In [37]: a1 = a[0]
In [38]: a += 1
In [39]: a
Out[39]: array([1, 2, 3, 4])
In [40]: c
Out[40]: array([1, 2, 3, 4])
In [41]: a1
Out[41]: 0

an inplace change in a does not change a1 , even though it did change c . 即使它确实改变了ca中的内部变化也不会改变a1

__array_interface__ shows us where the databuffer for an array is stored - think of it, in a loose sense, as the memory address of that buffer. __array_interface__我们展示了其中用于阵列设置DataBuffer存储-想起来,在松散感,作为缓冲器的存储器地址。

In [42]: a.__array_interface__['data']
Out[42]: (31233216, False)
In [43]: c.__array_interface__['data']
Out[43]: (31233216, False)
In [44]: a1.__array_interface__['data']
Out[44]: (28513712, False)

The view has the same databuffer. 视图具有相同的数据缓冲区。 But a1 does not. 但是a1没有。 a[0:1] is a single element view of a , and does share the data buffer. a[0:1]是单个元件viewa ,并执行共享数据缓冲器。

In [45]: a[0:1].__array_interface__['data']
Out[45]: (31233216, False)
In [46]: a[1:2].__array_interface__['data']  # 8 bytes over
Out[46]: (31233224, False)

So id(a[0]) tells us next to nothing about a . 所以id(a[0])告诉我们几乎没有关于a Comparing ids only tells us something about how memory slots are recycled, or not, when constructing Python objects. 比较ID只会告诉我们在构造Python对象时如何回收内存槽。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM