python中的基本类型的轻质泡菜？

Question

All I want to do is serialize and unserialize tuples of strings or ints. 我要做的就是对字符串或整数的元组进行序列化和反序列化。

I looked at pickle.dumps() but the byte overhead is significant. 我查看了pickle.dumps（），但字节开销很大。 Basically it looks like it takes up about 4x as much space as it needs to. 基本上看起来它占用的空间大约是所需空间的4倍。 Besides, all I need is basic types and have no need to serialize objects. 此外，我需要的只是基本类型，不需要序列化对象。

marshal is a little better in terms of space but the result is full of nasty \\x00 bytes. 元帅在空间方面要好一些，但结果充满了讨厌的\\ x00字节。 Ideally I would like the result to be human readable. 理想情况下，我希望结果是人类可读的。

I thought of just using repr() and eval(), but is there a simple way I could accomplish this without using eval()? 我想到了只使用repr（）和eval（），但是有没有一种简单的方法可以在不使用eval（）的情况下完成此任务？

This is getting stored in a db, not a file. 这将存储在数据库中，而不是文件中。 Byte overhead matters because it could make the difference between requiring a TEXT column versus a varchar, and generally data compactness affects all areas of db performance. 字节开销很重要，因为它可能使需要TEXT列与varchar有所不同，并且通常数据紧凑性会影响db性能的所有方面。

Answer 1

Take a look at json , at least the generated dumps are readable with many other languages. 看一看json ，至少生成的dumps可以用许多其他语言读取。

JSON (JavaScript Object Notation) http://json.org is a subset of JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data interchange format. JSON（JavaScript对象表示法） http://json.org是JavaScript语法（ECMA-262第三版）的子集，用作轻量级数据交换格式。

Answer 2

personally i would use yaml . 我个人将使用yaml 。 it's on par with json for encoding size, but it can represent some more complex things (eg classes, recursive structures) when necessary. 它在编码大小上与json相当，但是在必要时它可以表示一些更复杂的东西（例如，类，递归结构）。

In [1]: import yaml
In [2]: x = [1, 2, 3, 'pants']
In [3]: print(yaml.dump(x))
[1, 2, 3, pants]

In [4]: y = yaml.load('[1, 2, 3, pants]')
In [5]: y
Out[5]: [1, 2, 3, 'pants']

Answer 3

Maybe you're not using the right protocol: 也许您没有使用正确的协议：

>>> import pickle
>>> a = range(1, 100)
>>> len(pickle.dumps(a))
492
>>> len(pickle.dumps(a, pickle.HIGHEST_PROTOCOL))
206

See the documentation for pickle data formats . 请参阅文档以获取泡菜数据格式。

Answer 4

If you need a space efficient solution you can use Google Protocol buffers. 如果您需要节省空间的解决方案，则可以使用Google协议缓冲区。

Protocol buffers - Encoding 协议缓冲区-编码

Protocol buffers - Python Tutorial 协议缓冲区-Python教程

Answer 5

There are some persistence builtins mentioned in the python documentation but I don't think any of these is remarkable smaller in the produced filesize. python文档中提到了一些持久性内建函数，但我认为在生成的文件大小中，这些内建函数中的任何一个都不显着。

You could alway use the configparser but there you only get string, int, float, bool. 您可以一直使用configparser，但是在那里您只能得到string，int，float，bool。

Answer 6

"the byte overhead is significant" “字节开销很大”

Why does this matter? 为什么这么重要？ It does the job. 它完成了工作。 If you're running low on disk space, I'd be glad to sell you a 1Tb for $500. 如果您的磁盘空间不足，我很高兴以500美元的价格向您出售1Tb。

Have you run it? 你跑了吗？ Is performance a problem? 性能有问题吗？ Can you demonstrate that the performance of serialization is the problem? 你能证明系列化的表现是什么问题？

"I thought of just using repr() and eval(), but is there a simple way I could accomplish this without using eval()?" “我想过只使用repr（）和eval（），但是有没有一种简单的方法可以不用eval（）来实现这一目标呢？”

Nothing simpler than repr and eval. 没有比repr和eval更简单的了。

What's wrong with eval? 评估有什么问题？

Is is the "someone could insert malicious code into the file where I serialized my lists" issue? 是“有人可以将恶意代码插入序列化列表的文件中”问题吗？

Who -- specifically -- is going to find and edit this file to put in malicious code? 谁（特别是）要查找和编辑此文件以放入恶意代码的人？ Anything you do to secure this (ie, encryption) removes "simple" from it. 您为保护此安全所做的任何事情（即加密）都会从中删除“简单”的内容。

Answer 7

Luckily there is solution which uses COMPRESSION, and solves the general problem involving any arbitrary Python object including new classes. 幸运的是，有一种使用COMPRESSION的解决方案，可以解决涉及任何任意Python对象（包括新类）的一般问题。 Rather than micro-manage mere tuples sometimes it's better to use a DRY tool. 有时，最好使用DRY工具，而不是仅对元组进行微管理。
Your code will be more crisp and readily refactored in similar future situations. 您的代码将更加清晰，并在类似的未来情况下易于重构。

y_serial.py module :: warehouse Python objects with SQLite y_serial.py模块::使用SQLite仓库Python对象

"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data." “序列化+持久性::在几行代码中，将Python对象压缩并注释为SQLite；然后稍后按关键字顺序按顺序检索它们，而无需任何SQL。数据库最有用的”标准”模块用于存储较少模式的数据。”

http://yserial.sourceforge.net http://yserial.sourceforge.net

[If you are still concerned, why not stick those tuples in a dictionary, then apply y_serial to the dictionary. [如果您仍然担心，为什么不将这些元组粘贴在字典中，然后将y_serial应用于字典。 Probably any overhead will vanish due to the transparent compression in the background by zlib.] zlib可能会在后台透明压缩，因此任何开销可能都会消失。]

As to readability, the documentation also gives details on why cPickle was selected over json. 关于可读性，该文档还提供了有关为什么选择cPickle而不是json的详细信息。

python中的基本类型的轻质泡菜？

问题描述

7 个解决方案

解决方案1
13 已采纳 2009-02-10 16:28:58

解决方案2
8 2009-02-10 16:47:10

解决方案3
8 2009-02-10 17:39:57

解决方案4
6 2009-02-10 17:14:10

解决方案5
1 2009-02-10 16:15:16

解决方案6
0 2009-02-10 16:15:50

解决方案7
-1

python中的基本类型的轻质泡菜？

问题描述

7 个解决方案

解决方案1 13 已采纳 2009-02-10 16:28:58

解决方案2 8 2009-02-10 16:47:10

解决方案3 8 2009-02-10 17:39:57

解决方案4 6 2009-02-10 17:14:10

解决方案5 1 2009-02-10 16:15:16

解决方案6 0 2009-02-10 16:15:50

解决方案7 -1

解决方案1
13 已采纳 2009-02-10 16:28:58

解决方案2
8 2009-02-10 16:47:10

解决方案3
8 2009-02-10 17:39:57

解决方案4
6 2009-02-10 17:14:10

解决方案5
1 2009-02-10 16:15:16

解决方案6
0 2009-02-10 16:15:50

解决方案7
-1