简体   繁体   English

操作系统如何处理比内存大的python dict?

[英]How does OS handle a python dict that's larger than memory?

I have a python program that is going to eat a lot of memory, primarily in a dict. 我有一个python程序,它会占用大量内存,主要是在dict中。 This dict will be responsible for assigning a unique integer value to a very large set of keys. 该dict将负责为一组非常大的键分配唯一的整数值。 As I am working with large matrices, I need a key-to-index correspondence that can also be recovered from (ie, once matrix computations are complete, I need to map the values back to the original keys). 当我使用大型矩阵时,我需要一个也可以从中恢复的密钥到索引的对应关系(即,一旦矩阵计算完成,我需要将值映射回原始密钥)。

I believe this amount will eventually surpass available memory. 我相信这个数量最终会超过可用内存。 I am wondering how this will be handled with regards to swap space. 我想知道如何处理交换空间。 Perhaps there is a better data structure for this purpose. 也许为此目的有一个更好的数据结构。

You need a database, if the data will exceed memory. 如果数据超出内存,则需要数据库。 The indexing of dictionaries isn't designed for good performance when a dictionary is bigger than memory. 当字典大于内存时,字典的索引不是为了获得良好的性能而设计的。

Swap space is a kernel feature and transparant to the user (python). 交换空间是一个内核功能,并且可以透明地传递给用户(python)。

If you do have a huge dict and don't need all the data at once, you could look at redis which might do what you want. 如果你确实有一个巨大的字典并且不需要同时获得所有数据,那么你可以看看redis可能会做你想要的。 Or maybe not :) 或者可能不是 :)

It will just end up in swap trashing, because a hash table has very much randomized memory access patterns. 它最终将在交换垃圾中,因为哈希表具有非常随机的内存访问模式。

If you know that the map exceeds the size of the physical memory, you could consider using a data structure on the disk in the first place. 如果您知道映射超出了物理内存的大小,则可以考虑首先在磁盘上使用数据结构。 This especially if you don't need the data structure during the computation. 特别是如果您在计算过程中不需要数据结构。 When the hash table triggers swapping, it creates problems also outside the hash table itself. 当哈希表触发交换时,它也会在哈希表本身之外产生问题。

As far as I can remember, when a dict is expanded it just relies on C's malloc. 据我所知,当dict扩展时,它只依赖于C的malloc。 The program will keep running as long as malloc keeps succeeding. 只要malloc保持成功,程序就会继续运行。 Most OS's will keep malloc working as long as there is enough memory, and then as long as there are pages that can be swapped in. In either case Python will throw a MemoryError exception when malloc fails, as per the documentation . 只要有足够的内存,大多数操作系统都会保持malloc工作,然后只要有可以交换的页面。在任何一种情况下,根据文档 ,当malloc失败时,Python会抛出一个MemoryError异常。 As far as the data structure goes, dict is going to be very efficient space-wise. 就数据结构而言,dict在空间方面将非常有效。 The only way to really do better is to use an analytical function to map the values back and forth. 真正做得更好的唯一方法是使用分析函数来回映射值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM