简体   繁体   English

以特定格式编写和阅读字典(Python)

[英]Writing and reading a dictionary in specific format (Python)

Sorry another newbie query :| 抱歉,另一个新手查询:| To build upon the suggestion which was given here, optimizing 以此处给出的建议为基础,进行优化

I need to be able to incrementally build a dictionary ie one key: value at a time inside a for loop. 我需要能够渐进地建立一个字典,即一个键:for循环中一次的值。 To be specific, the dictionary would look something like (N keys, with each value being a list of lists. The smaller inner list has 3 elements): 具体来说,字典看起来像(N个键,每个值是一个列表列表。较小的内部列表包含3个元素):

dic_score ={key1:[ [,,], [,,], [,,] ...[,,] ], key2:[ [,,], [,,], [,,] ..[,,] ] ..keyN:[[,,], [,,], [,,] ..[,,]]}

This dic is being generated from the following paradigm, a nested for loop. 此dic是从以下范例(嵌套的for循环)生成的。

for Gnodes in G.nodes()       # Gnodes iterates over 10000 values 
    Gvalue = someoperation(Gnodes)
    for Hnodes in H.nodes()   # Hnodes iterates over 10000 values 
        Hvalue =someoperation(Hnodes)
        score = SomeOperation on (Gvalue,Hvalue)
        dic_score.setdefault(Gnodes,[]).append([Hnodes, score, -1 ])

I then need to sort these lists, but the answer for that was given here, optimizing (use of generator expression in place of the inner loop is an option) 然后,我需要对这些列表进行排序,但是在此处给出了答案,即进行了优化 (可以选择使用生成器表达式代替内部循环)
[Note that the dic would contain 10000 keys with each key associated with a 10000 elements of smaller lists] [请注意,该dic将包含10000个键,每个键与10000个较小列表的元素相关联]

Since the loop counters are big, the dictionary generated is huge and I am running out of memory. 由于循环计数器很大,因此生成的字典很大,而且我的内存不足。

How can I write the write the Key:value (a list of lists) as soon as it is generated to a file, so that I don't need to hold the entire dictionary in memory. 一旦将Key:value(列表的列表)生成到文件中,我该如何它,这样我就不需要将整个字典保存在内存中。 I then want to be able to read back the dictionary in the same format ie something like dic_score_after_reading[key], returns me the list of list I am looking for. 然后,我希望能够以相同的格式(例如dic_score_after_reading [key]之类的格式) 回字典,并向我返回我要查找的列表的列表。

I am hopping that doing this writing and reading per key:value would considerably ease the memory requirements. 我希望按key:value进行写入和读取将大大减轻内存需求。 Is there a better data structure to do this? 是否有更好的数据结构可以做到这一点? Shall I be considering a database , probably like Buzhug, which would give me the flexibility to access and iterate over lists associated with each key ? 我是否应该考虑一个数据库(可能类似于Buzhug),这将使我能够灵活地访问和迭代与每个键相关联的列表?

I am currently using cPickle to dump the entire dictionary and then reading it back via load(), but cPickle crashes while dumping such a big data in one go. 我目前正在使用cPickle转储整个词典,然后通过load()将其读回,但是cPickle在一次转储这么大的数据时崩溃。

Apologies, but I am unaware of the best practices to do this type of stuff. 抱歉,但是我不知道执行此类操作的最佳做​​法。 Thanks ! 谢谢 !

You could look into using the ZODB in combination with the included BTrees implementation. 您可以考虑将ZODB与随附的BTrees实现结合使用。

What that gives is a mapping-like structure that writes individual entries separately to the object store. 所提供的是一种类似映射的结构,该结构将各个条目分别写入对象存储。 You'd need to use savepoints or plain transactions to flush data out to the storage, but you can handle huge amounts of data this way. 您需要使用保存点或简单事务将数据刷新到存储中,但是您可以通过这种方式处理大量数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM