简体   繁体   中英

Numpy array subtraction: inconsistent values for large arrays

Here's a problem I came across today: I am trying to subtract the first row of a matrix from the (large) entire matrix. As a test, I made all rows equal. Here's a MWE:

import numpy as np
first = np.random.normal(size=10)
reference = np.repeat((first,), 10000, axis=0)
copy_a = np.copy(reference)
copy_a -= copy_a[0]
print np.all(copy_a == 0) # prints False

Oh wow - False ! So I tried another thing:

copy_b = np.copy(reference)
copy_b -= reference[0]
np.all(copy_b == 0) # prints True

Examining the new copy_a array, I found that copy_a[0:818] are all zeros, copy_a[820:] are the original values, while copy_a[819] got operated partly.

In [115]: copy_a[819]
Out[115]: 
array([ 0.        ,  0.        ,  0.57704706, -0.22270692, -1.83793342,
        0.58976187, -0.71014837,  1.80517635, -0.98758385, -0.65062774])

Looks like midway during the operation, numpy went back and looked at copy_a[0] , found it is all zeros, and hence subtracted zeros from the rest of the array. I find this weird. Is this a bug, or is it an expected numpy result?

The infix operator -= modifies the array inplace, meaning that you are pulling the rug under your own feet. The effect that you see might have to do with internal caching of results (ie first "commit" happens after 818 rows).

The solution is to swap out the subtrahend into another array:

copy_a -= copy_a[0].copy()

This issue has actually been reported multiple times to the numpy repository (see below). It is considered a bug , but is very hard to fix without sacrificing performance (copying the input arrays) because correctly detecting if two arrays share memory is difficult.

Therefore, for now, you'd better just make a copy of copy_a[0] as explained in @Torben's answer .

The essence of the issue is that your are modifying the array while iterating. It happens to work until copy_a[819] simply because 8192 (819×10+2) is the size of numpy's assign buffer .


  1. https://github.com/numpy/numpy/issues/6119
  2. https://github.com/numpy/numpy/issues/5241
  3. https://github.com/numpy/numpy/issues/4802
  4. https://github.com/numpy/numpy/issues/2705
  5. https://github.com/numpy/numpy/issues/1683

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM