简体   繁体   English

我需要通过在硬盘驱动器上存储Python字典来释放RAM,而不是在RAM中。 可能吗?

[英]I need to free up RAM by storing a Python dictionary on the hard drive, not in RAM. Is it possible?

In my case, I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings. 在我的例子中,我有一个大约6000个实例化类的字典,其中每个类有1000个属性变量,所有类型字符串或字符串列表。 As I build this dictionary up, my RAM goes up super high. 当我建立这个词典时,我的RAM上升得非常高。 Is there a way to write the dictionary as it is being built to the harddrive rather than the RAM so that I can save some memory? 有没有办法写字典,因为它是建立在硬盘驱动器而不是RAM,以便我可以节省一些内存? I've heard of something called "pickle" but I don't know if this is a feasible method for what I am doing. 我听说过一种叫做“泡菜”的东西,但我不知道这对我正在做的事情是否可行。

Thanks for your help! 谢谢你的帮助!

Maybe you should be using a database, but check out the shelve module 也许您应该使用数据库,但请查看shelve模块

If shelve isn't powerful enough for you, there is always the industrial strength ZODB 如果货架不够强大,那么ZODB总是存在工业强度

shelve , as @gnibbler recommends, is what I would no doubt be using, but watch out for two traps: a simple one (all keys must be strings) and a subtle one (as the values don't normally exist in memory, calling mutators on them may not work as you expect). shelve ,正如@gnibbler推荐的那样,我无疑会使用它,但要注意两个陷阱:一个简单的陷阱(所有键必须是字符串)和一个微妙的陷阱(因为值通常不存在于内存中,调用对他们的改变者可能无法按预期工作)。

For the simple problem, it's normally easy to find a workaround (and you do get a clear exception if you forget and try eg using an int or whatever as the key, so it's not hard t remember that you do need a workaround either). 对于这个简单的问题,通常很容易找到一个变通方法(如果你忘记并尝试使用int或其他任何键作为键,你会得到一个明确的异常,因此不难记住你确实需要一个解决方法)。

For the subtle problem, consider for example: 对于微妙的问题,请考虑例如:

x = d['foo']
x.amutatingmethod()
...much later...
y = d['foo']
# is y "mutated" or not now?

the answer to the question in the last comment depends on whether d is a real dict (in which case y will be mutated, and in fact exactly the same object as x ) or a shelf (in which case y will be a distinct object from x , and in exactly the state you last saved to d['foo'] !). 最后评论中问题的答案取决于d是否是真正的dict(在这种情况下y将被突变,实际上与x完全相同的对象)或shelf (在这种情况下y将是一个独特的对象x ,并且恰好处于你上次保存d['foo'] !)。

To get your mutations to persist, you need to "save them to disk" by doing 为了让你的突变持续存在,你需要通过这样做“将它们保存到磁盘”

d['foo'] = x

after calling any mutators you want on x (so in particular you cannot just do x上调用你想要的任何mutator之后(特别是你不能这么做)

d['foo'].mutator()

and expect the mutation to "stick", as you would if d were a dict). 并期望突变“坚持”,就像d dict一样。

shelve does have an option to cache all fetched items in memory, but of course that can fill up the memory again, and result in long delays when you finally close the shelf object (since all the cached items must be saved back to disk then, just in case they had been mutated). shelve 确实有一个选项可以将所有获取的项目缓存在内存中,但当然可以再次填满内存, 在最终关闭shelf对象时导致长时间延迟(因为所有缓存的项目必须保存回磁盘然后,以防它们发生变异)。 That option was something I originally pushed for (as a Python core committer), but I've since changed my mind and I now apologize for getting it in (ah well, at least it's not the default!-), since the cases it should be used in are rare, and it can often trap the unwary user... sorry. 这个选项是我最初推动的(作为Python核心提交者),但我已经改变了主意,现在我为它进入道歉道歉(好吧,至少它不是默认的! - ),因为它的情况应该用于罕见的,它可以经常陷阱粗心的用户...抱歉。

BTW, in case you don't know what a mutator, or "mutating method", is, it's any method that alters the state of the object you call it on -- eg .append if the object is a list, .pop if the object is any kind of container, and so on. 顺便说一下,如果你不知道什么是mutator或者是“mutating method”,那么它就是改变你调用它的对象状态的任何方法 - 例如.append如果对象是一个列表, .pop if对象是任何类型的容器,依此类推。 No need to worry if the object is immutable, of course (numbers, strings, tuples, frozensets, ...), since it doesn't have mutating methods in that case;-). 当然,如果对象是不可变的,则无需担心(数字,字符串,元组,冻结......),因为在这种情况下它没有变异方法;-)。

Pickling an entire hash over and over again is bound to run into the same memory pressures that you're facing now -- maybe even worse, with all the data marshaling back and forth. 一遍又一遍地腌制整个哈希必然会遇到你现在面临的相同内存压力 - 可能更糟糕的是,所有数据来回聚集。

Instead, using an on-disk database that acts like a hash is probably the best bet; 相反,使用哈希一样的磁盘数据库可能是最好的选择; see this page for a quick introduction to using dbm-style databases in your program: http://docs.python.org/library/dbm 有关在程序中使用dbm样式数据库的快速介绍,请参阅此页面: http//docs.python.org/library/dbm

They act enough like hashes that it should be a simple transition for you. 它们的行为就像哈希一样,它应该是一个简单的过渡。

"""I have a dictionary of about 6000 instantiated classes, where each class has 1000 attributed variables all of type string or list of strings""" ... I presume that you mean: """I have a class with about 1000 attributes all of type str or list of str . I have a dictionary mapping about 6000 keys of unspecified type to corresponding instances of that class.""" If that's not a reasonable translation, please correct it. msgstr“”“我有一个大约6000个实例化类的字典,其中每个类有1000个属性变量,所有类型字符串或字符串列表”“”......我猜你的意思是:“”“我有一个大约1000的类属性所有类型的strliststr ,我对未指定类型的 6000个键的字典映射到相应的该类的实例。“””如果这不是一个合理的翻译,请更正。

For a start, 1000 attributes in a class is mindboggling. 首先,类中的1000个属性是令人难以置信的。 You must be treating the vast majority generically using value = getattr(obj, attr_name) and setattr(obj, attr_name, value) . 您必须使用value = getattr(obj, attr_name)setattr(obj, attr_name, value)来处理绝大多数。 Consider using a dict instead of an instance: value = obj[attr_name] and obj[attr_name] = value . 考虑使用dict而不是实例: value = obj[attr_name]obj[attr_name] = value

Secondly, what percentage of those 6 million attributes are ""? 其次,这600万属性中有多少百分比是“”? If sufficiently high, you might like to consider implementing a sparse dict which doesn't physically have entries for those attributes, using the __missing__ hook -- docs here . 如果足够高,您可能会考虑使用__missing__ hook - docs来实现一个稀疏的dict,它实际上没有这些属性的条目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM