简体   繁体   中英

Odd behavior of using += with numpy.array and numpy.ma.array

Can anyone explain the following result to me? I know it is not as one would usually do this operation, but I found this result odd.

import numpy as np

a = np.ma.masked_where(np.arange(20)>10,np.arange(20))
b = np.ma.masked_where(np.arange(20)>-1,np.arange(20))
c = np.zeros(a.shape)
d = np.zeros(a.shape)

c[~a.mask] += b[~a.mask]

print(b[~a.mask])
#masked_array(data=[--, --, --, --, --, --, --, --,--, --, --],
#             mask=[ True,  True,  True,  True,  True,  True,  True,  True, True,  True,  True],
#       fill_value=999999,
#            dtype=int64)

print(c)
#[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.  0.  0.  0.  0. 0.  0.  0.  0.  0.]

d[~a.mask] = d[~a.mask] + b[~a.mask]

print(d)
#[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

I expected c to not change, but I guess there is something related to objects in memory going on here. Also, += keeps the original object, while = and + creates a new d .

I just don't really understand where the data comes from that's added to c .

I will start with a simpler example for better understanding:

b = np.ma.masked_where(np.arange(20)>-1,np.arange(20))
#b: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#b.data: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
c = np.zeros(b.shape)
#c: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
d = np.zeros(b.shape)
#d: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

c += b
#c: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]

d = d + b
#d: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#d.data: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

The first operation c += b is an in-place operation. In other words, it is equivalent to c = type(c).__iadd__(c, b) which does the addition according to type of c , which is not a masked array, hence the data of b used as unmasked.

On the other hand, d = d + b is equivalent to d = np.MaskedArray.__add__(d, b) (to be more particular, since masked arrays are a subclass of ndarrays, it uses __radd__ ) and is NOT an in-place assignment. This means it creates a new object and uses the wider type on the right hand side of the equation when adding and hence converts d (which is an unmasked array) to a masked array (because b is a masked array), therefore the addition uses valid values only (which in this case there is none since ALL elements of b are masked and invalid). This results in a masked array d with same mask as b while the data of d remains unchanged.

This difference in behavior is not Numpy specific and applies to python itself too. The case mentioned in the question by OP has similar behavior, and as @alaniwi mentioned in the comments, the boolean indexing with mask a is not fundamental to the behavior. Using a to mask elements of b , c , and d is only limiting the assignment to masked elements by a (rather than all elements of arrays) and nothing more.

To makes things a bit more interesting and in fact clearer, lets switch the places of b and d on the right hand side:

e = np.zeros(b.shape)
#e: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

e = b + e
#e: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --]
#e.data: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]

Note that, similar to d = d + b , the right hand side uses masked array __add__ function, so the output is a masked array, but since you are adding e to b (aka e = np.MaskedArray.__add__(b, e) ), the masked data of b is returned, while in d = d + b , you are adding b to d and data of d is returned.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM