简体   繁体   English

转换为numpy数组需要多少内存和处理

[英]How much memory and processing does it take to convert to numpy array

I'm working on files that will requires me to load integers into an array that is ~18million in length. 我正在处理需要将整数加载到长度约为1800万的数组中的文件。

How much memory and processing time will it take to do something like 要做类似的事情需要多少内存和处理时间

my_list = [123,231,90,20,...,92] #length is 18 million
new_list = np.array(my_list, type='int')

Would this be the computer simply creating a second array that has 18 million in length and duplicating or does numpy do something more complicated. 这是计算机仅创建第二个长度为1800万并重复的阵列还是numpy做一些更复杂的事情。

In this case "processing time" doesn't come into a great deal since it's mostly a matter of pointer lookups. 在这种情况下,“处理时间”并不重要,因为这主要是指针查找问题。 Since you tagged the post big-O, this is going to be O(n) . 由于您标记了帖子big-O,因此它将是O(n) When converting a Python list to a Numpy array it will reserve memory to store len(my_list) integers in the array--you can find out exactly how much memory this is by seeing what the default int type is on your Numpy and using the .itemsize attribute of the dtype object: 将Python list转换为Numpy数组时,它将保留内存以在数组中存储len(my_list)整数-您可以通过查看Numpy上默认的int类型并使用来确切地了解这是多少内存.itemsize所述的属性dtype对象:

>>> np.dtype('int')
dtype('int64')
>>> np.dtype('int').itemsize
8

So this array will require 8 * len(my_list) bytes to store on top of the memory already used by your original my_list . 因此,此数组将需要8 * len(my_list)个字节存储在原始my_list已使用的内存上。

It will then need to loop over each item in the list, look up what type of Python object it is (remember, Python list s can be heterogeneous--there's no way to know a priori that every item in the list will be convertible to an integer). 然后,它将需要遍历列表中的每个项目,查找它是什么类型的Python对象(请记住,Python list可能是异构的,无法先验地知道列表中的每个项目都可以转换为整数)。 Numpy will then do its best to convert that Python object to a machine integer and store it in the list. 然后,Numpy会尽最大努力将Python对象转换为机器整数并将其存储在列表中。

Depending on where these 18 million integers are coming from it may be desirable not to store them in a Python list in the first place, if at all possible. 如果可能的话,最好不要将它们存储在Python list中,这取决于这1800万个整数来自何处。 But saying anything more about that would require more detail in the question. 但是要说更多有关该问题的信息,将需要更多细节。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM