Python中的缓冲/批处理序列化？

Question

I have an algorithm that iteratively creates a very large, highly nested dictionary. 我有一个算法可以迭代创建一个非常大的，高度嵌套的字典。 I would like to buffer parts of this dictionary and then periodically stream the buffer to disk so that I can re-create the whole dictionary at another time. 我想缓冲此字典的某些部分，然后定期将缓冲区流式传输到磁盘，以便在其他时间重新创建整个字典。

It seems like pickle is intended for one-pass serialization. 泡菜似乎打算用于一遍序列化。 Is there a way to serialize a dictionary in batches to a single output stream? 有没有一种方法可以将字典批量序列化为单个输出流？

Answer 1

Ok, it looks like the following will partially solve the problem: 好的，看起来以下内容可以部分解决问题：

with open('file','ab') as f:
  while <stopping condition>:
    <generate (key,value) pair 'k'>
    pickle.dump(k,f)

Now, to reconstruct the whole dictionary, you just do the following: 现在，要重构整个字典，只需执行以下操作：

with open('file','rb') as f:
    fullMapping = {}
        hasNext = True
        while hasNext:
            try:
                fullMapping.update(pickle.load(f))
            except:
                f.close()
                hasNext = False

This will reconstitute the full dictionary when run. 运行时，这将重新构成完整的词典。

Python中的缓冲/批处理序列化？

问题描述

1 个解决方案

解决方案1
0 已采纳

Python中的缓冲/批处理序列化？

问题描述

1 个解决方案

解决方案1 0 已采纳

解决方案1
0 已采纳