简体   繁体   English

Python中的缓冲/批处理序列化?

[英]Buffered/batch serialization in Python?

I have an algorithm that iteratively creates a very large, highly nested dictionary. 我有一个算法可以迭代创建一个非常大的,高度嵌套的字典。 I would like to buffer parts of this dictionary and then periodically stream the buffer to disk so that I can re-create the whole dictionary at another time. 我想缓冲此字典的某些部分,然后定期将缓冲区流式传输到磁盘,以便在其他时间重新创建整个字典。

It seems like pickle is intended for one-pass serialization. 泡菜似乎打算用于一遍序列化。 Is there a way to serialize a dictionary in batches to a single output stream? 有没有一种方法可以将字典批量序列化为单个输出流?

Ok, it looks like the following will partially solve the problem: 好的,看起来以下内容可以部分解决问题:

with open('file','ab') as f:
  while <stopping condition>:
    <generate (key,value) pair 'k'>
    pickle.dump(k,f)

Now, to reconstruct the whole dictionary, you just do the following: 现在,要重构整个字典,只需执行以下操作:

with open('file','rb') as f:
    fullMapping = {}
        hasNext = True
        while hasNext:
            try:
                fullMapping.update(pickle.load(f))
            except:
                f.close()
                hasNext = False

This will reconstitute the full dictionary when run. 运行时,这将重新构成完整的词典。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM