简体   繁体   English

Python:为什么泡菜?

[英]Python: why pickle?

I have been using pickle and was very happy, then I saw this article: Don't Pickle Your Data 我一直在使用泡菜,非常高兴,然后我看到这篇文章: 不要腌制你的数据

Reading further it seems like: 进一步阅读它似乎:

I've switched to saving my data as JSON, but I wanted to know about best practice: 我已经切换到将数据保存为JSON,但我想了解最佳实践:

Given all these issues, when would you ever use pickle? 鉴于所有这些问题,你何时会使用泡菜? What specific situations call for using it? 具体情况需要什么?

Pickle is unsafe because it constructs arbitrary Python objects by invoking arbitrary functions. Pickle是不安全的,因为它通过调用任意函数来构造任意Python对象。 However, this is also gives it the power to serialize almost any Python object, without any boilerplate or even white-/black-listing (in the common case). 但是,这也使它能够序列化几乎所有Python对象,没有任何样板或甚至白/黑列表(在常见情况下)。 That's very desirable for some use cases: 对于某些用例来说,这是非常理想的:

  • Quick & easy serialization, for example for pausing and resuming a long-running but simple script. 快速简便的序列化,例如暂停和恢复长时间运行但简单的脚本。 None of the concerns matter here, you just want to dump the program's state as-is and load it later. 这里没有任何问题,你只是想按原样转储程序的状态并在以后加载它。
  • Sending arbitrary Python data to other processes or computers, as in multiprocessing . 将任意Python数据发送到其他进程或计算机,就像在multiprocessing The security concerns may apply (but mostly don't), the generality is absolutely necessary, and humans won't have to read it. 安全问题可能适用(但大多数情况下不适用),普遍性是绝对必要的,人类不必阅读它。

In other cases, none of the drawbacks is quite enough to justify the work of mapping your stuff to JSON or another restrictive data model. 在其他情况下,没有任何缺点足以证明将您的东西映射到JSON或其他限制性数据模型的工作。 Maybe you don't expect to need human readability/safety/cross-language compatibility or maybe you can do without. 也许你不希望需要人类可读性/安全性/跨语言兼容性,或者你可以不用。 Remember, You Ain't Gonna Need It. 记住,你不需要它。 Using JSON would be the right thing™ but right doesn't always equal good. 使用JSON将是正确的事情但权利并不总是等于好。

You'll notice that I completely ignored the "slow" downside. 你会注意到我完全忽略了“缓慢”的缺点。 That's because it's partially misleading: Pickle is indeed slower for data that fits the JSON model (strings, numbers, arrays, maps) perfectly, but if your data's like that you should use JSON for other reasons anyway. 那是因为它有部分误导性:对于完全符合JSON模型(字符串,数字,数组,地图)的数据,Pickle确实较慢,但如果您的数据类似,那么您应该出于其他原因使用JSON。 If your data isn't like that (very likely), you also need to take into account the custom code you'll need to turn your objects into JSON data, and the custom code you'll need to turn JSON data back into your objects. 如果您的数据不是那样(非常可能),您还需要考虑将对象转换为JSON数据所需的自定义代码,以及将JSON数据转换回您的JSON数据所需的自定义代码对象。 It adds both engineering effort and run-time overhead, which must be quantified on a case-by-case basis. 它增加了工程工作量和运行时间开销,必须根据具体情况进行量化。

Pickle has the advantage of convenience -- it can serialize arbitrary object graphs with no extra work, and works on a pretty broad range of Python types. Pickle具有方便的优点 - 它可以序列化任意对象图而无需额外的工作,并且适用于各种各样的Python类型。 With that said, it would be unusual for me to use Pickle in new code. 话虽如此,我在新代码中使用Pickle是不寻常的。 JSON is just a lot cleaner to work with. 使用JSON更加清晰。

I usually use neither Pickle, nor JSON, but MessagePack it is both safe and fast, and produces serialized data of small size. 我通常既不使用Pickle也不使用JSON,但MessagePack既安全又快速,并生成小尺寸的序列化数据。

An additional advantage is possibility to exchange data with software written in other languages (which of course is also true in case of JSON). 另一个优点是可以与用其他语言编写的软件交换数据(当然,在JSON的情况下也是如此)。

You can find some answer on JSON vs. Pickle security : JSON can only pickle unicode, int, float, NoneType, bool, list and dict. 你可以找到关于JSON与Pickle安全性的一些答案:JSON只能腌制unicode,int,float,NoneType,bool,list和dict。 You can't use it if you want to pickle more advanced objects such as classes instance. 如果要挑选更高级的对象(如类实例),则无法使用它。 Note that for those kinds of pickle, there is no hope to be language agnostic. 请注意,对于那些类型的泡菜,没有希望与语言无关。

Also using cPickle instead of Pickle partially resolve the speed progress. 同时使用cPickle而不是Pickle部分解决速度进度。

I have tried several methods and found out that using cPickle with setting the protocol argument of the dumps method as: cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) is the fastest dump method. 我尝试了几种方法,发现使用cPickle将cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)方法的protocol参数设置为: cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)是最快的转储方法。

import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np

num_tests = 10

obj = np.random.normal(0.5, 1, [240, 320, 3])

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle:  %f seconds" % result)

command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle:   %f seconds" % result)


command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest:   %f seconds" % result)

command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)


command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack:   %f seconds" % result)

Output: 输出:

pickle         :   0.847938 seconds
cPickle        :   0.810384 seconds
cPickle highest:   0.004283 seconds
json           :   1.769215 seconds
msgpack        :   0.270886 seconds

So, I prefer cPickle with the highest dumping protocol in situations that require real time performance such as video streaming from a camera to a server. 因此,在需要实时性能的情况下,例如从摄像机到服务器的视频流,我更喜欢具有最高转储协议的cPickle。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM