简体   繁体   中英

weird behaviour of numpy array element identities

I have written the following code with python lists

# python lists
vc = [1,2,3,4]
print('original array')
print(hex(id(vc)))
print([hex(id(vc[i])) for i in range(len(vc))])
print(vc)
# --
g = vc[1:3]
print('array slice')
print(hex(id(g)))
print([hex(id(g[i])) for i in range(len(g))])
print(g)
# --
g[:] = [-1,-2]
print('original array')
print(hex(id(vc)))
print([hex(id(vc[i])) for i in range(len(vc))])
print(vc)
# --
print('array slice')
print(hex(id(g)))
print([hex(id(g[i])) for i in range(len(g))])
print(g)

that produces the expected output

original array
0x211acca9d48
['0x7ffc4ffbb350', '0x7ffc4ffbb370', '0x7ffc4ffbb390', '0x7ffc4ffbb3b0']
[1, 2, 3, 4]
array slice
0x211acc69e88
['0x7ffc4ffbb370', '0x7ffc4ffbb390']
[2, 3]
original array
0x211acca9d48
['0x7ffc4ffbb350', '0x7ffc4ffbb370', '0x7ffc4ffbb390', '0x7ffc4ffbb3b0']
[1, 2, 3, 4]
array slice
0x211acc69e88
['0x7ffc4ffbb310', '0x7ffc4ffbb2f0']
[-1, -2]

We can see that the python list slice creates a copy. Once the new array g is modified then the elements of the new array change ids.

If we repeat the same with numpy arrays

# numpy arrays
import numpy as np
vc = np.array([1,2,3,4])
print('original array')
print(hex(id(vc)))
print([hex(id(vc[i])) for i in range(len(vc))])
print(vc)
# --
g = vc[1:3]
print('array slice')
print(hex(id(g)))
print([hex(id(g[i])) for i in range(len(g))])
print(g)
# --
g[:] = [-1,-2]
print('original array')
print(hex(id(vc)))
print([hex(id(vc[i])) for i in range(len(vc))])
print(vc)
# --
print('array slice')
print(hex(id(g)))
print([hex(id(g[i])) for i in range(len(g))])
print(g)

we get the output

original array
0x211acbe64e0
['0x211acd107e0', '0x211acd107e0', '0x211acd107e0', '0x211acd107e0']
[1 2 3 4]
array slice
0x211acd674e0
['0x211acd107e0', '0x211acd107e0']
[2 3]
original array
0x211acbe64e0
['0x211acd107e0', '0x211acd107e0', '0x211acd107e0', '0x211acd107e0']
[ 1 -1 -2  4]
array slice
0x211acd674e0
['0x211acd107e0', '0x211acd107e0']
[-1 -2]

We see that slicing of numpy arrays produces views, but the element ids make no sense. I was thinking of using ids as a means to understand when things are copied with numpy (and with pandas) and when views are created but I cannot understand what is going on.

One difference between lists and arrays is that lists store python objects whereas arrays store raw data. As a consequence when retrieving a single element, the list __getitem__ can simply return a reference while the array __getitem__ must first create a python object from the raw data.

In the current cpython implementation id returns an object's memory address. As the array element objects created by __getitem__ are immediately deallocated once they leave scope the underlying memory is recycled which is why all the elements have the same id.

You can check this by keeping the newly generated objects alive (by referencing them) in which case new id's will be generated. Even if you retrieve the same element multiple times:

repeat = [g[0] for dummy in "123"]
repeat
# [-1, -1, -1]
print([hex(id(x)) for x in repeat])
# ['0x7f2961d56f60', '0x7f2961d56f78', '0x7f2961d56f48']

Im pretty new to numpy but I found this in the documention: "NumPy slicing creates a view instead of a copy as in the case of builtin Python sequences such as string, tuple and list." So when you use numpy array slicing,you get the pointer of the elements from the original array-so the ids doesn't change(same pointer as in the original array). while in list after the list is modified, the ids does change because a new pointers to the elements are created.

But im not sure if my answer is accurate.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM