Lightweight pickle for basic types in python?

Question

All I want to do is serialize and unserialize tuples of strings or ints.

I looked at pickle.dumps() but the byte overhead is significant. Basically it looks like it takes up about 4x as much space as it needs to. Besides, all I need is basic types and have no need to serialize objects.

marshal is a little better in terms of space but the result is full of nasty \\x00 bytes. Ideally I would like the result to be human readable.

I thought of just using repr() and eval(), but is there a simple way I could accomplish this without using eval()?

This is getting stored in a db, not a file. Byte overhead matters because it could make the difference between requiring a TEXT column versus a varchar, and generally data compactness affects all areas of db performance.

Answer 1

Take a look at json , at least the generated dumps are readable with many other languages.

JSON (JavaScript Object Notation) http://json.org is a subset of JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data interchange format.

Answer 2

personally i would use yaml . it's on par with json for encoding size, but it can represent some more complex things (eg classes, recursive structures) when necessary.

In [1]: import yaml
In [2]: x = [1, 2, 3, 'pants']
In [3]: print(yaml.dump(x))
[1, 2, 3, pants]

In [4]: y = yaml.load('[1, 2, 3, pants]')
In [5]: y
Out[5]: [1, 2, 3, 'pants']

Answer 3

Maybe you're not using the right protocol:

>>> import pickle
>>> a = range(1, 100)
>>> len(pickle.dumps(a))
492
>>> len(pickle.dumps(a, pickle.HIGHEST_PROTOCOL))
206

See the documentation for pickle data formats .

Answer 4

If you need a space efficient solution you can use Google Protocol buffers.

Protocol buffers - Encoding

Protocol buffers - Python Tutorial

Answer 5

There are some persistence builtins mentioned in the python documentation but I don't think any of these is remarkable smaller in the produced filesize.

You could alway use the configparser but there you only get string, int, float, bool.

Answer 6

"the byte overhead is significant"

Why does this matter? It does the job. If you're running low on disk space, I'd be glad to sell you a 1Tb for $500.

Have you run it? Is performance a problem? Can you demonstrate that the performance of serialization is the problem?

"I thought of just using repr() and eval(), but is there a simple way I could accomplish this without using eval()?"

Nothing simpler than repr and eval.

What's wrong with eval?

Is is the "someone could insert malicious code into the file where I serialized my lists" issue?

Who -- specifically -- is going to find and edit this file to put in malicious code? Anything you do to secure this (ie, encryption) removes "simple" from it.

Answer 7

Luckily there is solution which uses COMPRESSION, and solves the general problem involving any arbitrary Python object including new classes. Rather than micro-manage mere tuples sometimes it's better to use a DRY tool.
Your code will be more crisp and readily refactored in similar future situations.

y_serial.py module :: warehouse Python objects with SQLite

"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."

http://yserial.sourceforge.net

[If you are still concerned, why not stick those tuples in a dictionary, then apply y_serial to the dictionary. Probably any overhead will vanish due to the transparent compression in the background by zlib.]

As to readability, the documentation also gives details on why cPickle was selected over json.

Lightweight pickle for basic types in python?

Question

7 answers

solution1
13 ACCPTED 2009-02-10 16:28:58

solution2
8 2009-02-10 16:47:10

solution3
8 2009-02-10 17:39:57

solution4
6 2009-02-10 17:14:10

solution5
1 2009-02-10 16:15:16

solution6
0 2009-02-10 16:15:50

solution7
-1

Lightweight pickle for basic types in python?

Question

7 answers

solution1 13 ACCPTED 2009-02-10 16:28:58

solution2 8 2009-02-10 16:47:10

solution3 8 2009-02-10 17:39:57

solution4 6 2009-02-10 17:14:10

solution5 1 2009-02-10 16:15:16

solution6 0 2009-02-10 16:15:50

solution7 -1

solution1
13 ACCPTED 2009-02-10 16:28:58

solution2
8 2009-02-10 16:47:10

solution3
8 2009-02-10 17:39:57

solution4
6 2009-02-10 17:14:10

solution5
1 2009-02-10 16:15:16

solution6
0 2009-02-10 16:15:50

solution7
-1