[英]Pickle versus shelve storing large dictionaries in Python
If I am storing a large directory as a pickle
file, does loading it via cPickle
mean that it will all be consumed into memory at once? 如果我将一个大目录存储为
pickle
文件,是否通过cPickle
加载它意味着它将全部被一次性地消耗到内存中?
If so, is there a cross platform way to get something like pickle
, but access each entry one key at a item (ie avoid loading all of the dictionary into memory and only load each entry by name)? 如果是这样,是否有一种跨平台的方式来获取像
pickle
这样的东西,但是在一个项目上访问每个条目一个键(即避免将所有字典加载到内存中,只按名称加载每个条目)? I know shelve
is supposed to do this: is that as portable as pickle
though? 我知道
shelve
应该这样做:虽然像pickle
一样便携吗?
I know shelve is supposed to do this: is that as portable as pickle though?
我知道shelve应该这样做:虽然像pickle一样便携吗?
Yes. 是。
shelve
is part of The Python Standard Library and is written in Python. shelve
是Python标准库的一部分, 用Python编写。
So if you have a large dictionary: 所以,如果你有一个大词典:
bigd = {'a': 1, 'b':2, # . . .
}
And you want to save it without having to read the whole thing in later then don't save it as a pickle, it would be better to save it as a shelf, a sort of on disk dictionary. 并且你想保存它而不必在以后阅读整个事情然后不要将它保存为泡菜,最好将其保存为一个架子,一种在磁盘字典上。
import shelve
myShelve = shelve.open('my.shelve')
myShelve.update(bigd)
myShelve.close()
Then later you can: 然后你可以:
import shelve
myShelve = shelve.open('my.shelve')
value = myShelve['a']
value += 1
myShelve['a'] = value
You basically treat the shelve object like a dict, but the items are stored on disk (as individual pickles) and read in as needed. 您基本上将搁置对象视为dict,但这些项目存储在磁盘上(作为单独的pickle)并根据需要读入。
If your objects could be stored as a list of properties, then sqlite may be a good alternative. 如果您的对象可以存储为属性列表,那么sqlite可能是一个不错的选择。 Shelves and pickles are convenient, but can only be accessed by Python, but a sqlite database can by read from most languages.
货架和泡菜很方便,但只能通过Python访问,但sqlite数据库可以从大多数语言中读取。
If you want a module that's more robust than shelve
, you might look at klepto
. 如果你想要一个比
shelve
更强大的模块,你可能会看看klepto
。 klepto
is built to provide a dictionary interface to platform-agnostic storage on disk or database, and is built to work with large data. klepto
旨在为磁盘或数据库上与平台无关的存储提供字典接口,并且可以处理大数据。
Here, we first create some pickled objects stored on disk. 在这里,我们首先创建一些存储在磁盘上的pickle对象。 They use the
dir_archive
, which stores one object per file. 他们使用
dir_archive
,它为每个文件存储一个对象。
>>> d = dict(zip('abcde',range(5)))
>>> d['f'] = max
>>> d['g'] = lambda x:x**2
>>>
>>> import klepto
>>> help(klepto.archives.dir_archive)
>>> print klepto.archives.dir_archive.__new__.__doc__
initialize a dictionary with a file-folder archive backend
Inputs:
name: name of the root archive directory [default: memo]
dict: initial dictionary to seed the archive
cached: if True, use an in-memory cache interface to the archive
serialized: if True, pickle file contents; otherwise save python objects
compression: compression level (0 to 9) [default: 0 (no compression)]
memmode: access mode for files, one of {None, 'r+', 'r', 'w+', 'c'}
memsize: approximate size (in MB) of cache for in-memory compression
>>> a = klepto.archives.dir_archive(dict=d)
>>> a
dir_archive('memo', {'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': <function <lambda> at 0x102f562a8>, 'f': <built-in function max>}, cached=True)
>>> a.dump()
>>> del a
Now, the data is all on disk, let's pick and choose which ones we want to load in to memory. 现在,数据全部在磁盘上,让我们选择我们要加载到内存中的数据。
b
is the dict in memory, while b.archive
maps the collection of files into a dictionary view. b
是内存中的dict,而b.archive
将文件集合映射到字典视图中。
>>> b = klepto.archives.dir_archive('memo')
>>> b
dir_archive('memo', {}, cached=True)
>>> b.keys()
[]
>>> b.archive.keys()
['a', 'c', 'b', 'e', 'd', 'g', 'f']
>>> b.load('a')
>>> b
dir_archive('memo', {'a': 0}, cached=True)
>>> b.load('b')
>>> b.load('f')
>>> b.load('g')
>>> b['g'](b['f'](b['a'],b['b']))
1
klepto
also provides the same interface to a sql
archive. klepto
还为sql
存档提供了相同的接口。
>>> print klepto.archives.sql_archive.__new__.__doc__
initialize a dictionary with a sql database archive backend
Connect to an existing database, or initialize a new database, at the
selected database url. For example, to use a sqlite database 'foo.db'
in the current directory, database='sqlite:///foo.db'. To use a mysql
database 'foo' on localhost, database='mysql://user:pass@localhost/foo'.
For postgresql, use database='postgresql://user:pass@localhost/foo'.
When connecting to sqlite, the default database is ':memory:'; otherwise,
the default database is 'defaultdb'. If sqlalchemy is not installed,
storable values are limited to strings, integers, floats, and other
basic objects. If sqlalchemy is installed, additional keyword options
can provide database configuration, such as connection pooling.
To use a mysql or postgresql database, sqlalchemy must be installed.
Inputs:
name: url for the sql database [default: (see note above)]
dict: initial dictionary to seed the archive
cached: if True, use an in-memory cache interface to the archive
serialized: if True, pickle table contents; otherwise cast as strings
>>> c = klepto.archives.sql_archive('database')
>>> c.update(b)
>>> c
sql_archive('sqlite:///database', {'a': 0, 'b': 1, 'g': <function <lambda> at 0x10446b1b8>, 'f': <built-in function max>}, cached=True)
>>> c.dump()
Where now the same objects on disk are also in a sql archive. 现在,磁盘上的相同对象也在sql存档中。 We can add new objects to either archive.
我们可以将新对象添加到存档中。
>>> b['x'] = 69
>>> c['y'] = 96
>>> b.dump('x')
>>> c.dump('y')
Get klepto
here: https://github.com/uqfoundation 在这里获取
klepto
: https : //github.com/uqfoundation
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.