Pickle與擱置在Python中存儲大型詞典

Question

如果我將一個大目錄存儲為pickle文件，是否通過cPickle加載它意味着它將全部被一次性地消耗到內存中？

如果是這樣，是否有一種跨平台的方式來獲取像pickle這樣的東西，但是在一個項目上訪問每個條目一個鍵（即避免將所有字典加載到內存中，只按名稱加載每個條目）？ 我知道shelve應該這樣做：雖然像pickle一樣便攜嗎？

Answer 1

我知道shelve應該這樣做：雖然像pickle一樣便攜嗎？

是。 shelve是Python標准庫的一部分，用Python編寫。

編輯

所以，如果你有一個大詞典：

bigd = {'a': 1, 'b':2, # . . .
}

並且你想保存它而不必在以后閱讀整個事情然后不要將它保存為泡菜，最好將其保存為一個架子，一種在磁盤字典上。

import shelve

myShelve = shelve.open('my.shelve')
myShelve.update(bigd)
myShelve.close()

然后你可以：

import shelve

myShelve = shelve.open('my.shelve')
value = myShelve['a']
value += 1
myShelve['a'] = value

您基本上將擱置對象視為dict，但這些項目存儲在磁盤上（作為單獨的pickle）並根據需要讀入。

如果您的對象可以存儲為屬性列表，那么sqlite可能是一個不錯的選擇。 貨架和泡菜很方便，但只能通過Python訪問，但sqlite數據庫可以從大多數語言中讀取。

Answer 2

如果你想要一個比shelve更強大的模塊，你可能會看看klepto 。 klepto旨在為磁盤或數據庫上與平台無關的存儲提供字典接口，並且可以處理大數據。

在這里，我們首先創建一些存儲在磁盤上的pickle對象。 他們使用dir_archive ，它為每個文件存儲一個對象。

>>> d = dict(zip('abcde',range(5)))
>>> d['f'] = max
>>> d['g'] = lambda x:x**2
>>> 
>>> import klepto
>>> help(klepto.archives.dir_archive)       

>>> print klepto.archives.dir_archive.__new__.__doc__
initialize a dictionary with a file-folder archive backend

    Inputs:
        name: name of the root archive directory [default: memo]
        dict: initial dictionary to seed the archive
        cached: if True, use an in-memory cache interface to the archive
        serialized: if True, pickle file contents; otherwise save python objects
        compression: compression level (0 to 9) [default: 0 (no compression)]
        memmode: access mode for files, one of {None, 'r+', 'r', 'w+', 'c'}
        memsize: approximate size (in MB) of cache for in-memory compression

>>> a = klepto.archives.dir_archive(dict=d)
>>> a
dir_archive('memo', {'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': <function <lambda> at 0x102f562a8>, 'f': <built-in function max>}, cached=True)
>>> a.dump()
>>> del a

現在，數據全部在磁盤上，讓我們選擇我們要加載到內存中的數據。 b是內存中的dict，而b.archive將文件集合映射到字典視圖中。

>>> b = klepto.archives.dir_archive('memo')
>>> b
dir_archive('memo', {}, cached=True)
>>> b.keys()   
[]
>>> b.archive.keys()
['a', 'c', 'b', 'e', 'd', 'g', 'f']
>>> b.load('a')
>>> b
dir_archive('memo', {'a': 0}, cached=True)
>>> b.load('b')
>>> b.load('f')
>>> b.load('g')
>>> b['g'](b['f'](b['a'],b['b']))
1

klepto還為sql存檔提供了相同的接口。

>>> print klepto.archives.sql_archive.__new__.__doc__
initialize a dictionary with a sql database archive backend

    Connect to an existing database, or initialize a new database, at the
    selected database url. For example, to use a sqlite database 'foo.db'
    in the current directory, database='sqlite:///foo.db'. To use a mysql
    database 'foo' on localhost, database='mysql://user:pass@localhost/foo'.
    For postgresql, use database='postgresql://user:pass@localhost/foo'. 
    When connecting to sqlite, the default database is ':memory:'; otherwise,
    the default database is 'defaultdb'. If sqlalchemy is not installed,
    storable values are limited to strings, integers, floats, and other
    basic objects. If sqlalchemy is installed, additional keyword options
    can provide database configuration, such as connection pooling.
    To use a mysql or postgresql database, sqlalchemy must be installed.

    Inputs:
        name: url for the sql database [default: (see note above)]
        dict: initial dictionary to seed the archive
        cached: if True, use an in-memory cache interface to the archive
        serialized: if True, pickle table contents; otherwise cast as strings

>>> c = klepto.archives.sql_archive('database')
>>> c.update(b)
>>> c
sql_archive('sqlite:///database', {'a': 0, 'b': 1, 'g': <function <lambda> at 0x10446b1b8>, 'f': <built-in function max>}, cached=True)
>>> c.dump()

現在，磁盤上的相同對象也在sql存檔中。 我們可以將新對象添加到存檔中。

>>> b['x'] = 69
>>> c['y'] = 96
>>> b.dump('x')
>>> c.dump('y')

在這里獲取klepto ： https ： //github.com/uqfoundation

Pickle與擱置在Python中存儲大型詞典

問題描述

2 個解決方案

解決方案1
21 已采納 2013-02-03 02:07:01

編輯

解決方案2
6 2015-09-15 13:09:24

Pickle與擱置在Python中存儲大型詞典

問題描述

2 個解決方案

解決方案1 21 已采納 2013-02-03 02:07:01

編輯

解決方案2 6 2015-09-15 13:09:24

解決方案1
21 已采納 2013-02-03 02:07:01

解決方案2
6 2015-09-15 13:09:24