读取/写入/更新 object，无需将 object 加载到 memory

Question

I have been trying out with the Klepto package to Write/Read/Update my object to harddisk, aiming to avoid the "out of memory" issues that I experienced when training my model with my dataset.我一直在尝试使用 Klepto package 将我的 object 写入/读取/更新到硬盘，旨在避免我在使用我的数据集训练我的 model 时遇到的“内存不足”问题。 From my understanding, with the Klepto I could store my data as a key-value based mechanism.根据我的理解，使用 Klepto 我可以将我的数据存储为基于键值的机制。 But I am not quite sure if I could directly Update the object when I load the data back from the klepto.archieve.但是我不太确定当我从 klepto.archieve 加载数据时是否可以直接更新 object。 When updating, eg adding a value to the list, while keeping not to directly load the object out to memory to avoid "out of memory" problem.更新时，例如向列表中添加一个值，同时保持不要将 object 直接加载到 memory 以避免“内存不足”问题。

Here is a sample about the saved data (please correct me if this is also not the correct way for setting it up):这是一个关于保存数据的示例（如果这也不是设置它的正确方法，请纠正我）：

from klepto.archives import *
arch = file_archive('test.txt')
arch['a'] = [3,4,5,6,7]
arch.dump()
arch.pop('a')

Answer 1

I'm the klepto author.我是klepto的作者。 If I understand what you want, it looks like you have set it up correctly.如果我明白你想要什么，那么看起来你已经正确设置了它。 The critical keyword is cached .关键关键字已cached 。 If you use cached=True , then the archive is constructed as an in-memory cache with a manually-synchronized file backend.如果您使用cached=True ，那么存档将被构造为内存中的缓存，并带有手动同步的文件后端。 If you use cached=False , then there's no in-memory cache... you just access the file archive directly.如果您使用cached=False ，则没有内存缓存......您只需直接访问文件存档。

Python 3.7.16 (default, Dec  7 2022, 05:04:27) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from klepto.archives import *
>>> arch = file_archive('test.txt', cached=True)
>>> arch['a'] = [3,4,5,6,7]
>>> arch.dump() # dump to file archive
>>> arch.pop('a') # delete from memory
[3, 4, 5, 6, 7]
>>> arch
file_archive('test.txt', {}, cached=True)
>>> arch.load('a') # load from file archive
>>> arch
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=True)
>>> 
>>> arch2 = file_archive('test.txt', cached=True)
>>> arch2
file_archive('test.txt', {}, cached=True)
>>> arch2.load() # load from file archive
>>> arch2
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=True)
>>> 
>>> arch3 = file_archive('test.txt', cached=False)
>>> arch3 # directly access file-archive
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=False)
>>>

You can also manipulate objects that are already in the archive... unfortunately, for cached=False , the object needs to be loaded into memory to be edited (due to lack of implementation for in-archive editing, you can only replace objects in a cached=False archive).您还可以操作存档中已有的对象...不幸的是，对于cached=False ，需要将 object 加载到 memory 中进行编辑（由于缺少存档内编辑的实现，您只能替换中的对象cached=False存档）。

>>> arch2
file_archive('test.txt', {'a': [3, 4, 5, 6, 7]}, cached=True)
>>> arch2['a'].append(8) # edit the in-memory object
>>> arch2
file_archive('test.txt', {'a': [3, 4, 5, 6, 7, 8]}, cached=True)
>>> arch2.dump('a') # save changes to file-archive
>>> arch3
file_archive('test.txt', {'a': [3, 4, 5, 6, 7, 8]}, cached=False)
>>> 
>>> arch3['a'] = arch2['a'][1:] # replace directly in-file
>>> arch3
file_archive('test.txt', {'a': [4, 5, 6, 7, 8]}, cached=False)

读取/写入/更新 object，无需将 object 加载到 memory

问题描述

1 个解决方案

解决方案1
0 已采纳 2023-01-02 10:06:26

读取/写入/更新 object，无需将 object 加载到 memory

问题描述

1 个解决方案

解决方案1 0 已采纳 2023-01-02 10:06:26

解决方案1
0 已采纳 2023-01-02 10:06:26