简体   繁体   English

Numpy数组减法:大数组的值不一致

[英]Numpy array subtraction: inconsistent values for large arrays

Here's a problem I came across today: I am trying to subtract the first row of a matrix from the (large) entire matrix. 这是我今天遇到的一个问题:我试图从(大)整个矩阵中减去矩阵的第一行。 As a test, I made all rows equal. 作为测试,我使所有行都相等。 Here's a MWE: 这是一个MWE:

import numpy as np
first = np.random.normal(size=10)
reference = np.repeat((first,), 10000, axis=0)
copy_a = np.copy(reference)
copy_a -= copy_a[0]
print np.all(copy_a == 0) # prints False

Oh wow - False ! 哦哇 - False So I tried another thing: 所以我尝试了另一件事:

copy_b = np.copy(reference)
copy_b -= reference[0]
np.all(copy_b == 0) # prints True

Examining the new copy_a array, I found that copy_a[0:818] are all zeros, copy_a[820:] are the original values, while copy_a[819] got operated partly. 检查新的copy_a数组,我发现copy_a[0:818]都是零, copy_a[820:]是原始值,而copy_a[819]是部分操作的。

In [115]: copy_a[819]
Out[115]: 
array([ 0.        ,  0.        ,  0.57704706, -0.22270692, -1.83793342,
        0.58976187, -0.71014837,  1.80517635, -0.98758385, -0.65062774])

Looks like midway during the operation, numpy went back and looked at copy_a[0] , found it is all zeros, and hence subtracted zeros from the rest of the array. 看起来在操作过程的中途, numpy回去看了一下copy_a[0] ,发现它全是零,因此从数组的其余部分减去零。 I find this weird. 我发现这很奇怪。 Is this a bug, or is it an expected numpy result? 这是一个错误,还是预期的numpy结果?

The infix operator -= modifies the array inplace, meaning that you are pulling the rug under your own feet. 中缀运算符-=修改数组到位,这意味着你是在自己的脚下拉地毯。 The effect that you see might have to do with internal caching of results (ie first "commit" happens after 818 rows). 您看到的效果可能与结果的内部缓存有关(即首先“提交”发生在818行之后)。

The solution is to swap out the subtrahend into another array: 解决方案是将subtrahend换成另一个数组:

copy_a -= copy_a[0].copy()

This issue has actually been reported multiple times to the numpy repository (see below). 实际上已经多次向numpy存储库报告此问题(参见下文)。 It is considered a bug , but is very hard to fix without sacrificing performance (copying the input arrays) because correctly detecting if two arrays share memory is difficult. 它被认为是一个错误 ,但很难在不牺牲性能(复制输入数组)的情况下修复,因为正确检测两个数组是否共享内存是困难的。

Therefore, for now, you'd better just make a copy of copy_a[0] as explained in @Torben's answer . 因此,现在,您最好只复制copy_a[0]@ Torben的答案中所述

The essence of the issue is that your are modifying the array while iterating. 问题的实质是你在迭代时修改数组。 It happens to work until copy_a[819] simply because 8192 (819×10+2) is the size of numpy's assign buffer . 它恰好在copy_a[819]之前工作,因为8192(819×10 + 2)是numpy指定缓冲区大小


  1. https://github.com/numpy/numpy/issues/6119 https://github.com/numpy/numpy/issues/6119
  2. https://github.com/numpy/numpy/issues/5241 https://github.com/numpy/numpy/issues/5241
  3. https://github.com/numpy/numpy/issues/4802 https://github.com/numpy/numpy/issues/4802
  4. https://github.com/numpy/numpy/issues/2705 https://github.com/numpy/numpy/issues/2705
  5. https://github.com/numpy/numpy/issues/1683 https://github.com/numpy/numpy/issues/1683

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM