简体   繁体   中英

Accessing items from a dictionary using pickle efficiently in Python

I have a large dictionary mapping keys (which are strings) to objects. I pickled this large dictionary and at certain times I want to pull out only a handful of entries from it. The dictionary has usually thousands of entries total. When I load the dictionary using pickle, as follows:

from cPickle import *
# my dictionary from pickle, containing thousands of entries
mydict = open(load('mypickle.pickle'))
# accessing only handful of entries here
for entry in relevant_entries:
  # find relevant entry
  value = mydict[entry]

I notice that it can take up to 3-4 seconds to load the entire pickle, which I don't need, since I access only a tiny subset of the dictionary entries later on (shown above.)

How can I make it so pickle only loads those entries that I have from the dictionary, to make this faster?

Thanks.

Pickle serializes object (hierachies), it's not an on-disk store. As you have seen, you must unpickle the entire object to use it - which is of course wasteful. Use shelve , dbm or a database ( SQLite ) for on-disk storage.

You'll have to have "Ghost" objects, Ie objects that are only placeholders and load themselves when accessed. This is a Difficult Issue, but it has been solved. You have two options. You can use the persistence library from ZODB, that helps with this. Or, you just start using ZODB directly; problem solved.

http://www.zodb.org/

If your objects are independent of each others, you could pickle and unpickle them individually using their key as filename, in some perverse way a directory is a kind of dictionary mapping filenames to files. This way it is simple to load only relevant entries.

Basically you use a memory dictionary as cache and if the searched key is missing try to load the file from the filesystem.

I'm not really saying you should do that. A database (ZODB, SQLite, other) is probably better for persistant storage.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM