简体   繁体   English

使用Python永久存储字典的优雅方式?

[英]Elegant way to store dictionary permanently with Python?

Currently expensively parsing a file, which generates a dictionary of ~400 key, value pairs, which is seldomly updated. 目前昂贵地解析文件,该文件生成约400个键值对的字典,其很少更新。 Previously had a function which parsed the file, wrote it to a text file in dictionary syntax (ie. dict = {'Adam': 'Room 430', 'Bob': 'Room 404'} ) etc, and copied and pasted it into another function whose sole purpose was to return that parsed dictionary. 之前有一个解析文件的函数,用字典语法将其写入文本文件(即dict = {'Adam': 'Room 430', 'Bob': 'Room 404'} )等,并复制并粘贴它到另一个函数,其唯一目的是返回解析的字典。

Hence, in every file where I would use that dictionary, I would import that function, and assign it to a variable, which is now that dictionary. 因此,在我将使用该字典的每个文件中,我将导入该函数,并将其分配给变量,现在是该字典。 Wondering if there's a more elegant way to do this, which does not involve explicitly copying and pasting code around? 想知道是否有更优雅的方法来做到这一点,这不涉及明确地复制和粘贴代码? Using a database kind of seems unnecessary, and the text file gave me the benefit of seeing whether the parsing was done correctly before adding it to the function. 使用数据库似乎是不必要的,并且文本文件给了我在将其添加到函数之前查看解析是否正确完成的好处。 But I'm open to suggestions. 但我愿意接受建议。

Why not dump it to a JSON file, and then load it from there where you need it? 为什么不将它转储到JSON文件,然后从你需要的地方加载它?

import json

with open('my_dict.json', 'w') as f:
    json.dump(my_dict, f)

# elsewhere...

with open('my_dict.json') as f:
    my_dict = json.load(f)

Loading from JSON is fairly efficient. 从JSON加载相当有效。

Another option would be to use pickle , but unlike JSON, the files it generates aren't human-readable so you lose out on the visual verification you liked from your old method. 另一个选择是使用pickle ,但与JSON不同,它生成的文件不是人类可读的,所以你会丢失你喜欢的旧方法的视觉验证。

Why mess with all these serialization methods? 为什么搞乱所有这些序列化方法? It's already written to a file as a Python dict (although with the unfortunate name 'dict'). 它已经作为Python dict写入文件(虽然名字'dict'不幸)。 Change your program to write out the data with a better variable name - maybe 'data', or 'catalog', and save the file as a Python file, say data.py. 更改程序以使用更好的变量名称写出数据 - 可能是“数据”或“目录”,并将文件保存为Python文件,例如data.py. Then you can just import the data directly at runtime without any clumsy copy/pasting or JSON/shelve/etc. 然后,您可以直接在运行时导入数据,而无需任何笨拙的复制/粘贴或JSON /搁置/等。 parsing: 解析:

from data import catalog

JSON is probably the right way to go in many cases; 在许多情况下,JSON可能是正确的方法; but there might be an alternative. 但可能有另一种选择。 It looks like your keys and your values are always strings, is that right? 它看起来像你的键和你的值总是字符串,是吗? You might consider using dbm / anydbm . 您可以考虑使用dbm / anydbm These are "databases" but they act almost exactly like dictionaries. 这些是“数据库”,但它们的行为几乎与字典完全相同。 They're great for cheap data persistence. 它们非常适合廉价的数据持久性。

>>> import anydbm
>>> dict_of_strings = anydbm.open('data', 'c')
>>> dict_of_strings['foo'] = 'bar'
>>> dict_of_strings.close()
>>> dict_of_strings = anydbm.open('data')
>>> dict_of_strings['foo']
'bar'

If the keys are all strings, you can use the shelve module 如果键都是字符串,则可以使用搁置模块

A shelf is a persistent, dictionary-like object. 架子是一个持久的,类似字典的对象。 The difference with “dbm” databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the pickle module can handle. 与“dbm”数据库的区别在于,架子中的值(而不是键!)可以是基本上任意的Python对象 - pickle模块可以处理的任何东西。 This includes most class instances, recursive data types, and objects containing lots of shared sub-objects. 这包括大多数类实例,递归数据类型和包含许多共享子对象的对象。 The keys are ordinary strings. 键是普通的字符串。

json would be a good choice if you need to use the data from other languages 如果您需要使用其他语言的数据, json将是一个不错的选择

If storage efficiency matters, use Pickle or CPickle(for execution performance gain). 如果存储效率很重要,请使用Pickle或CPickle(用于执行性能增益)。 As Amber pointed out, you can also dump/load via Json. 正如Amber指出的那样,你也可以通过Json转储/加载。 It will be human-readable, but takes more disk. 它将是人类可读的,但需要更多的磁盘。

I suggest you consider using the shelve module since your data-structure is a mapping. 我建议您考虑使用shelve模块,因为您的数据结构是映射。 That was my answer to a similar question titled If I want to build a custom database, how could I? 这是我对类似问题的回答如果我想构建一个自定义数据库,我怎么办? There's also a bit of sample code in another answer of mine promoting its use for the question How to get a object database? 我的另一个答案中还有一些示例代码,用于推广其用于如何获取对象数据库的问题?

ActiveState has a highly rated PersistentDict recipe which supports csv, json, and pickle output file formats. ActiveState具有高度评级的PersistentDict配方,它支持csv,json和pickle输出文件格式。 It's pretty fast since all three of those formats are implement in C (although the recipe itself is pure Python), so the fact that it reads the whole file into memory when it's opened might be acceptable. 它非常快,因为所有这三种格式都是用C实现的(虽然配方本身就是纯Python),所以它在打开时将整个文件读入内存的事实可能是可以接受的。

on the JSON direction there is also something called simpleJSON. 在JSON方向上还有一些叫做simpleJSON的东西。 My first time using json in python the json library didnt work for me/ i couldnt figure it out. 我第一次在python中使用json json库对我来说没有用/我无法弄明白。 simpleJSON was...easier to use simpleJSON更容易使用

JSON (or YAML, or whatever) serialisation is probably better, but if you're already writing the dictionary to a text file in python syntax, complete with a variable name binding, you could just write that to a .py file instead. JSON(或YAML,或其他)序列化可能更好,但如果您已经使用python语法将字典编写为文本文件,并使用变量名称绑定,则可以将其写入.py文件。 Then that python file would be importable and usable as is. 然后该python文件将是可导入的并且可以按原样使用。 There's no need for the "function which returns a dictionary" approach, since you can directly use it as a global in that file. 不需要“返回字典的函数”方法,因为您可以直接将其用作该文件中的全局。 eg 例如

# generated.py
please_dont_use_dict_as_a_variable_name = {'Adam': 'Room 430', 'Bob': 'Room 404'}

rather than: 而不是:

# manually_copied.py
def get_dict():
    return {'Adam': 'Room 430', 'Bob': 'Room 404'}

The only difference is that manually_copied.get_dict gives you a fresh copy of the dictionary every time, whereas generated.please_dont_use_dict_as_a_variable_name [1] is a single shared object. 唯一的区别是, manually_copied.get_dict给你每次词典的最新副本,而generated.please_dont_use_dict_as_a_variable_name [1]是一个单一的共享对象。 This may matter if you're modifying the dictionary in your program after retrieving it, but you can always use copy.copy or copy.deepcopy to create a new copy if you need to modify one independently of the others. 如果您在检索程序后修改程序中的字典,这可能很重要,但如果您需要独立修改其他副本,则可以始终使用copy.copycopy.deepcopy创建新副本。


[1] dict , list , str , int , map , etc are generally viewed as bad variable names. [1] dictliststrintmap等通常被视为错误的变量名。 The reason is that these are already defined as built-ins, and are used very commonly. 原因是它们已经被定义为内置函数,并且非常常用。 So if you give something a name like that, at the least it's going to cause cognitive-dissonance for people reading your code (including you after you've been away for a while) as they have to keep in mind that " dict doesn't mean what it normally does here". 因此,如果你给出类似名称的东西,至少它会导致阅读你的代码的人(包括你离开一段时间后)的认知不协调,因为他们必须记住“ dict doesn”这意味着它通常在这里做什么“。 It's also quite likely that at some point you'll get an infuriating-to-solve bug reporting that dict objects aren't callable (or something), because some piece of code is trying to use the type dict , but is getting the dictionary object you bound to the name dict instead. 也很可能在某些时候你会得到一个真实的解决bug报告dict对象不可调用(或者其他东西),因为某些代码试图使用类型 dict ,但是得到了字典你绑定到名字dict对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM