简体   繁体   中英

Numpy ndarray data ownership problem on reshape and view

I am confused about the ownership mechanism in numpy.

import numpy as np
a = np.arange(10)
a.flags.owndata     # True
id(a)               # 140289740187168

The first four lines is obvious, variable a owns data of id 140289740187168 .

b = a
c = a.view()
d = a.reshape((2, 5))
print(b.flags.owndata, b.base, id(b.base)) # True None 94817978163056
print(c.flags.owndata, c.base, id(c.base)) # False [0 1 2 3 4 5 6 7 8 9] 140289740187168
print(d.flags.owndata, d.base, id(d.base)) # False [0 1 2 3 4 5 6 7 8 9] 140289740187168
id(None)                                   # 94817978163056

Variable c,d are all "shallow" copy from a , so none of both own data. b is a and owns data (shared with a ).

a = a.view()
print(id(a))                               # 140289747003632
print(a.flags.owndata, a.base, id(a.base)) # False [0 1 2 3 4 5 6 7 8 9] 140289740187168

However, assigning view of a to a creates a new variable of id 140289747003632 and leaves the data ownership to the previous old a of id 140289740187168 .

The question is, since old a has been overloaded by the new a , it would be more reasonable to transfer the data ownership to the new a . Why the old a still keeps the data ownership?

b = a

b is a , just a different name for the same object. That's not even a copy.

These are views . A view is a new array, but it uses the same data buffer (as shown by the base :

c = a.view()
d = a.reshape((2, 5))

I like to use __array_interface__ to look at the basic attributes of an array:

In [210]: a = np.arange(10)
In [211]: a.__array_interface__
Out[211]: 
{'data': (43515408, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (10,),
 'version': 3}

The data[0] is some sort of representation of where the values or data of a are stored.

A view will have the same 'data' (with a possible offset). Otherwise the view has its own strides and shape . It is a new array object with shared base :

In [212]: d = a.reshape((2,5))
In [213]: d.__array_interface__
Out[213]: 
{'data': (43515408, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (2, 5),
 'version': 3}

Assigning the view to a does not change the original array or data buffer. The original a array object still exists in memory, along with its data buffer.

In [214]: a = a.view()
In [216]: a.__array_interface__['data']
Out[216]: (43515408, False)

If numpy 'updated' the a.base as you suggest, it would have to also update it for all views of the original a such as d .

In [218]: id(a)
Out[218]: 139767778126704
In [219]: id(a.base)
Out[219]: 139768132465328
In [220]: id(d.base)
Out[220]: 139768132465328

While python and numpy maintain some sort of reference count to determine what objects are garbage, numpy does not maintain a record of what views have been made. That is, while d.base links d to a , there's isn't a link the other way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM