简体   繁体   中英

numpy ndarray object much bigger than lists

I have now dealt with numpy array in more detail. You always read that numpy ndarray use less memory, but if you look at the total memory consumption, the ndarray is much larger than the list.

in lists we have int objects that are 28 bytes in size, but in numpy array we have numpy.int64 objects that are 32 bytes in size.

So i just don't understand why they say that numpy objects use less memory, because the numpy.int64 objects are four bytes larger than the int objects.

import numpy as np

from sys import getsizeof

def is_iterable(p_object):
    try:
        iter(p_object)
    except TypeError: 
        return False
    return True

def get_total_size(element, size):
  if not is_iterable(element):
    return size + getsizeof(element)
  size = size + getsizeof(element)
  for new_element in element:
    size = get_total_size(new_element, size)
  return size


if __name__ == "__main__":
  x_list = list(range(100))
  x_array = np.array(x_list)


  print("x_list:")
  print("A list with object references consumes in memory " + str(getsizeof(x_list)) + " Byte(s)")
  print("A list of object references and all objects consumed in memory " + str(get_total_size(x_list, 0)) + " Byte(s)")

  print("")

  print("Numpy-Array:")
  print("A ndarray object references consumes in memory " + str(getsizeof(x_array)) + " Byte(s)")
  print("A ndarray of object references and all objects consumed in memory  " + str(get_total_size(x_array, 0)) + " Byte(s)")

print("")
print("objecttype", type(x_array[1]), "size in bytes", getsizeof(x_array[1]), )
print("objecttype", type(x_list[1]), "size in bytes", getsizeof(x_list[1]), )

output:

x_list:
A list with object references consumes in memory 1016 Byte(s)
A list of object references and all objects consumed in memory 3812 Byte(s)

Numpy-Array:
A ndarray object references consumes in memory 896 Byte(s)
A ndarray of object references and all objects consumed in memory  4096 Byte(s)

objecttype <class 'numpy.int64'> size in bytes 32
objecttype <class 'int'> size in bytes 28
In [144]: alist = list(range(100))
In [145]: getsizeof(alist)
Out[145]: 856

Most getsizeof questions just use this base number, ignoring the references.

In [146]: get_total_size(alist,0)
Out[146]: 3652

size of individual integers can vary:

In [148]: getsizeof(50)
Out[148]: 28
In [149]: getsizeof(220000000000000000)
Out[149]: 32

100*28+856= 3656 close enough. Integers less than 256 are pre-allocated, so your list doesn't add those to the total memory use. But that's a minor detail.

For an array, with numeric dtype, we don't need to check the non-existent "references"

In [152]: arr = np.array(alist)
In [153]: getsizeof(arr)
Out[153]: 904
In [154]: arr.nbytes
Out[154]: 800

There are 800 bytes in its data-buffer, and about 100 for 'overhead'. That's 100*8 , 8 bytes per int64 number. Other dtypes may have different element sizes.

For object dtype arrays, adding the references matters:

In [155]: arr = np.array(alist,object)
In [156]: getsizeof(arr)
Out[156]: 904
In [158]: get_total_size(arr,0)
Out[158]: 3700     # 2800+900

This array references the same ints as alist .

Your get_total_size on the numeric dtype array finds that

In [164]: getsizeof(np.int64(50))
Out[164]: 32

but the array does not "store" 100 of those. That 32 is the 8 bytes for its value, and 24 of overhead. That's the "un-boxed" object, not the stored value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM