简体   繁体   English

使用Python中的pickle从字典中访问项目

[英]Accessing items from a dictionary using pickle efficiently in Python

I have a large dictionary mapping keys (which are strings) to objects. 我有一个大字典映射键(它是字符串)到对象。 I pickled this large dictionary and at certain times I want to pull out only a handful of entries from it. 我腌制了这本大字典,但在某些时候我只想从中抽出一些条目。 The dictionary has usually thousands of entries total. 该词典通常有数千个条目。 When I load the dictionary using pickle, as follows: 当我使用pickle加载字典时,如下所示:

from cPickle import *
# my dictionary from pickle, containing thousands of entries
mydict = open(load('mypickle.pickle'))
# accessing only handful of entries here
for entry in relevant_entries:
  # find relevant entry
  value = mydict[entry]

I notice that it can take up to 3-4 seconds to load the entire pickle, which I don't need, since I access only a tiny subset of the dictionary entries later on (shown above.) 我注意到加载整个pickle可能需要3-4秒,这是我不需要的,因为我稍后只访问字典条目的一小部分(如上所示)。

How can I make it so pickle only loads those entries that I have from the dictionary, to make this faster? 我怎样才能使它成为pickle只加载我从字典中获得的那些条目,以加快速度?

Thanks. 谢谢。

Pickle serializes object (hierachies), it's not an on-disk store. Pickle序列化对象(hierachies),它不是磁盘存储。 As you have seen, you must unpickle the entire object to use it - which is of course wasteful. 如您所见,您必须将整个对象拆开才能使用它 - 这当然是浪费。 Use shelve , dbm or a database ( SQLite ) for on-disk storage. 使用shelvedbm或数据库( SQLite )进行磁盘存储。

You'll have to have "Ghost" objects, Ie objects that are only placeholders and load themselves when accessed. 你必须拥有“Ghost”对象,即只占位符的对象,并在访问时自行加载。 This is a Difficult Issue, but it has been solved. 这是一个难题,但已经解决了。 You have two options. 你有两个选择。 You can use the persistence library from ZODB, that helps with this. 您可以使用ZODB中的持久性库,这有助于此。 Or, you just start using ZODB directly; 或者,您只是直接开始使用ZODB; problem solved. 问题解决了。

http://www.zodb.org/ http://www.zodb.org/

If your objects are independent of each others, you could pickle and unpickle them individually using their key as filename, in some perverse way a directory is a kind of dictionary mapping filenames to files. 如果你的对象是彼此独立的,你可以使用它们的密钥作为文件名单独地挑选和取消它们,以某种反常的方式,目录是一种将文件名映射到文件的字典。 This way it is simple to load only relevant entries. 这样,只加载相关条目很简单。

Basically you use a memory dictionary as cache and if the searched key is missing try to load the file from the filesystem. 基本上,您使用内存字典作为缓存,如果搜索到的密钥丢失,请尝试从文件系统加载文件。

I'm not really saying you should do that. 我并不是说你应该这样做。 A database (ZODB, SQLite, other) is probably better for persistant storage. 数据库(ZODB,SQLite,其他)可能更适合持久存储。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM