简体   繁体   English

python中的基本类型的轻质泡菜?

[英]Lightweight pickle for basic types in python?

All I want to do is serialize and unserialize tuples of strings or ints. 我要做的就是对字符串或整数的元组进行序列化和反序列化。

I looked at pickle.dumps() but the byte overhead is significant. 我查看了pickle.dumps(),但字节开销很大。 Basically it looks like it takes up about 4x as much space as it needs to. 基本上看起来它占用的空间大约是所需空间的4倍。 Besides, all I need is basic types and have no need to serialize objects. 此外,我需要的只是基本类型,不需要序列化对象。

marshal is a little better in terms of space but the result is full of nasty \\x00 bytes. 元帅在空间方面要好一些,但结果充满了讨厌的\\ x00字节。 Ideally I would like the result to be human readable. 理想情况下,我希望结果是人类可读的。

I thought of just using repr() and eval(), but is there a simple way I could accomplish this without using eval()? 我想到了只使用repr()和eval(),但是有没有一种简单的方法可以在不使用eval()的情况下完成此任务?

This is getting stored in a db, not a file. 这将存储在数据库中,而不是文件中。 Byte overhead matters because it could make the difference between requiring a TEXT column versus a varchar, and generally data compactness affects all areas of db performance. 字节开销很重要,因为它可能使需要TEXT列与varchar有所不同,并且通常数据紧凑性会影响db性能的所有方面。

Take a look at json , at least the generated dumps are readable with many other languages. 看一看json ,至少生成的dumps可以用许多其他语言读取。

JSON (JavaScript Object Notation) http://json.org is a subset of JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data interchange format. JSON(JavaScript对象表示法) http://json.org是JavaScript语法(ECMA-262第三版)的子集,用作轻量级数据交换格式。

personally i would use yaml . 我个人将使用yaml it's on par with json for encoding size, but it can represent some more complex things (eg classes, recursive structures) when necessary. 它在编码大小上与json相当,但是在必要时它可以表示一些更复杂的东西(例如,类,递归结构)。

In [1]: import yaml
In [2]: x = [1, 2, 3, 'pants']
In [3]: print(yaml.dump(x))
[1, 2, 3, pants]

In [4]: y = yaml.load('[1, 2, 3, pants]')
In [5]: y
Out[5]: [1, 2, 3, 'pants']

Maybe you're not using the right protocol: 也许您没有使用正确的协议:

>>> import pickle
>>> a = range(1, 100)
>>> len(pickle.dumps(a))
492
>>> len(pickle.dumps(a, pickle.HIGHEST_PROTOCOL))
206

See the documentation for pickle data formats . 请参阅文档以获取泡菜数据格式

If you need a space efficient solution you can use Google Protocol buffers. 如果您需要节省空间的解决方案,则可以使用Google协议缓冲区。

Protocol buffers - Encoding 协议缓冲区-编码

Protocol buffers - Python Tutorial 协议缓冲区-Python教程

There are some persistence builtins mentioned in the python documentation but I don't think any of these is remarkable smaller in the produced filesize. python文档中提到了一些持久性内建函数,但我认为在生成的文件大小中,这些内建函数中的任何一个都不显着。

You could alway use the configparser but there you only get string, int, float, bool. 您可以一直使用configparser,但是在那里您只能得到string,int,float,bool。

"the byte overhead is significant" “字节开销很大”

Why does this matter? 为什么这么重要? It does the job. 它完成了工作。 If you're running low on disk space, I'd be glad to sell you a 1Tb for $500. 如果您的磁盘空间不足,我很高兴以500美元的价格向您出售1Tb。

Have you run it? 你跑了吗? Is performance a problem? 性能有问题吗? Can you demonstrate that the performance of serialization is the problem? 你能证明系列化的表现什么问题?

"I thought of just using repr() and eval(), but is there a simple way I could accomplish this without using eval()?" “我想过只使用repr()和eval(),但是有没有一种简单的方法可以不用eval()来实现这一目标呢?”

Nothing simpler than repr and eval. 没有比repr和eval更简单的了。

What's wrong with eval? 评估有什么问题?

Is is the "someone could insert malicious code into the file where I serialized my lists" issue? 是“有人可以将恶意代码插入序列化列表的文件中”问题吗?

Who -- specifically -- is going to find and edit this file to put in malicious code? 谁(特别是)要查找和编辑此文件以放入恶意代码的人? Anything you do to secure this (ie, encryption) removes "simple" from it. 您为保护此安全所做的任何事情(即加密)都会从中删除“简单”的内容。

Luckily there is solution which uses COMPRESSION, and solves the general problem involving any arbitrary Python object including new classes. 幸运的是,有一种使用COMPRESSION的解决方案,可以解决涉及任何任意Python对象(包括新类)的一般问题。 Rather than micro-manage mere tuples sometimes it's better to use a DRY tool. 有时,最好使用DRY工具,而不是仅对元组进行微管理。
Your code will be more crisp and readily refactored in similar future situations. 您的代码将更加清晰,并在类似的未来情况下易于重构。

y_serial.py module :: warehouse Python objects with SQLite y_serial.py模块::使用SQLite仓库Python对象

"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data." “序列化+持久性::在几行代码中,将Python对象压缩并注释为SQLite;然后稍后按关键字顺​​序按顺序检索它们,而无需任何SQL。数据库最有用的”标准”模块用于存储较少模式的数据。”

http://yserial.sourceforge.net http://yserial.sourceforge.net

[If you are still concerned, why not stick those tuples in a dictionary, then apply y_serial to the dictionary. [如果您仍然担心,为什么不将这些元组粘贴在字典中,然后将y_serial应用于字典。 Probably any overhead will vanish due to the transparent compression in the background by zlib.] zlib可能会在后台透明压缩,因此任何开销可能都会消失。]

As to readability, the documentation also gives details on why cPickle was selected over json. 关于可读性,该文档还提供了有关为什么选择cPickle而不是json的详细信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM