[英]How much memory and processing does it take to convert to numpy array
I'm working on files that will requires me to load integers into an array that is ~18million in length. 我正在处理需要将整数加载到长度约为1800万的数组中的文件。
How much memory and processing time will it take to do something like 要做类似的事情需要多少内存和处理时间
my_list = [123,231,90,20,...,92] #length is 18 million
new_list = np.array(my_list, type='int')
Would this be the computer simply creating a second array that has 18 million in length and duplicating or does numpy do something more complicated. 这是计算机仅创建第二个长度为1800万并重复的阵列还是numpy做一些更复杂的事情。
In this case "processing time" doesn't come into a great deal since it's mostly a matter of pointer lookups. 在这种情况下,“处理时间”并不重要,因为这主要是指针查找问题。 Since you tagged the post big-O, this is going to be
O(n)
. 由于您标记了帖子big-O,因此它将是
O(n)
。 When converting a Python list
to a Numpy array it will reserve memory to store len(my_list)
integers in the array--you can find out exactly how much memory this is by seeing what the default int type is on your Numpy and using the .itemsize
attribute of the dtype
object: 将Python
list
转换为Numpy数组时,它将保留内存以在数组中存储len(my_list)
整数-您可以通过查看Numpy上默认的int类型并使用来确切地了解这是多少内存.itemsize
所述的属性dtype
对象:
>>> np.dtype('int')
dtype('int64')
>>> np.dtype('int').itemsize
8
So this array will require 8 * len(my_list)
bytes to store on top of the memory already used by your original my_list
. 因此,此数组将需要
8 * len(my_list)
个字节存储在原始my_list
已使用的内存上。
It will then need to loop over each item in the list, look up what type of Python object it is (remember, Python list
s can be heterogeneous--there's no way to know a priori that every item in the list will be convertible to an integer). 然后,它将需要遍历列表中的每个项目,查找它是什么类型的Python对象(请记住,Python
list
可能是异构的,无法先验地知道列表中的每个项目都可以转换为整数)。 Numpy will then do its best to convert that Python object to a machine integer and store it in the list. 然后,Numpy会尽最大努力将Python对象转换为机器整数并将其存储在列表中。
Depending on where these 18 million integers are coming from it may be desirable not to store them in a Python list
in the first place, if at all possible. 如果可能的话,最好不要将它们存储在Python
list
中,这取决于这1800万个整数来自何处。 But saying anything more about that would require more detail in the question. 但是要说更多有关该问题的信息,将需要更多细节。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.