简体   繁体   English

Python 序列化 - 为什么选择pickle?

[英]Python serialization - Why pickle?

I understood that Python pickling is a way to 'store' a Python Object in a way that does respect Object programming - different from an output written in txt file or DB.我知道 Python 酸洗是一种以尊重对象编程的方式“存储”Python 对象的方法 - 不同于用 txt 文件或 DB 编写的输出。

Do you have more details or references on the following points:您是否有关于以下几点的更多详细信息或参考资料:

  • where are pickled objects 'stored'?腌制物品“存放”在哪里?
  • why is pickling preserving object representation more than, say, storing in DB?为什么酸洗保留对象表示比存储在数据库中更多?
  • can I retrieve pickled objects from one Python shell session to another?我可以从一个 Python shell 会话到另一个会话检索腌制对象吗?
  • do you have significant examples when serialization is useful?当序列化有用时,你有重要的例子吗?
  • does serialization with pickle imply data 'compression'?使用pickle 进行序列化是否意味着数据“压缩”?

In other words, I am looking for a doc on pickling - Python.doc explains how to implement pickle but seems not dive into details about use and necessity of serialization.换句话说,我正在寻找关于酸洗的文档 - Python.doc 解释了如何实现酸洗,但似乎没有深入了解有关序列化的使用和必要性的细节。

Pickling is a way to convert a python object (list, dict, etc.) into a character stream. Pickling 是一种将 Python 对象(列表、字典等)转换为字符流的方法。 The idea is that this character stream contains all the information necessary to reconstruct the object in another python script.这个想法是这个字符流包含在另一个 python 脚本中重建对象所需的所有信息。

As for where the pickled information is stored, usually one would do:至于腌制信息的存储位置,通常会这样做:

with open('filename', 'wb') as f:
    var = {1 : 'a' , 2 : 'b'}
    pickle.dump(var, f)

That would store the pickled version of our var dict in the 'filename' file.这会将我们的var dict 的腌制版本存储在 'filename' 文件中。 Then, in another script, you could load from this file into a variable and the dictionary would be recreated:然后,在另一个脚本中,您可以从此文件加载到变量中,然后重新创建字典:

with open('filename','rb') as f:
    var = pickle.load(f)

Another use for pickling is if you need to transmit this dictionary over a network (perhaps with sockets or something.) You first need to convert it into a character stream, then you can send it over a socket connection.酸洗的另一个用途是,如果您需要通过网络(可能使用套接字或其他方式)传输此字典。首先需要将其转换为字符流,然后才能通过套接字连接发送它。

Also, there is no "compression" to speak of here...it's just a way to convert from one representation (in RAM) to another (in "text").此外,这里没有“压缩”可言……这只是将一种表示(在 RAM 中)转换为另一种(在“文本”中)的方法。

About.com has a nice introduction of pickling here . About.com 在这里有一个很好的酸洗介绍。

Pickling is absolutely necessary for distributed and parallel computing.酸洗对于分布式和并行计算是绝对必要的。

Say you wanted to do a parallel map-reduce with multiprocessing (or across cluster nodes with pyina ), then you need to make sure the function you want to have mapped across the parallel resources will pickle.假设您想使用multiprocessing (或使用pyina跨集群节点)执行并行 map-reduce,那么您需要确保要跨并行资源映射的函数将被处理。 If it doesn't pickle, you can't send it to the other resources on another process, computer, etc. Also see here for a good example.如果它不腌制,则不能将其发送到另一个进程、计算机等上的其他资源。另请参阅此处的一个很好的示例。

To do this, I use dill , which can serialize almost anything in python.为此,我使用dill ,它可以序列化 Python 中的几乎所有内容。 Dill also hassome good tools for helping you understand what is causing your pickling to fail when your code fails. Dill 也有一些很好的工具,可以帮助您了解在代码失败时是什么导致酸洗失败。

And, yes, people use picking to save the state of a calculation, or your ipython session, or whatever.而且,是的,人们使用选择来保存计算的状态,或者你的ipython会话,或者其他任何东西。 You can also extend pickle's Pickler and UnPickler to do compression with bz2 or gzip if you'd like.如果您愿意,您还可以扩展 pickle 的 Pickler 和 UnPickler 以使用bz2gzip进行压缩。

I find it to be particularly useful with large and complex custom classes.我发现它对于大型和复杂的自定义类特别有用。 In a particular example I'm thinking of, "Gathering" the information (from a database) to create the class was already half the battle.在我想到的一个特定示例中,“收集”信息(来自数据库)以创建类已经成功了一半。 Then that information stored in the class might be altered at runtime by the user.然后,用户可能会在运行时更改存储在类中的信息。

You could have another group of tables in the database and write another function to go through everything stored and write it to the new database tables.您可以在数据库中拥有另一组表并编写另一个函数来检查存储的所有内容并将其写入新的数据库表。 Then you would need to write another function to be able to load something saved by reading all of that info back in.然后你需要编写另一个函数来加载通过读回所有这些信息而保存的内容。

Alternatively, you could pickle the whole class as is and then store that to a single field in the database.或者,您可以按原样腌制整个类,然后将其存储到数据库中的单个字段中。 Then when you go to load it back, it will all load back in at once as it was before.然后当你去加载它时,它会像以前一样一次性加载回来。 This can end up saving a lot of time and code when saving and retrieving complicated classes.在保存和检索复杂的类时,这最终会节省大量时间和代码。

it is kind of serialization.这是一种序列化。 use cPickle it is much faster than pickle.使用 cPickle 它比泡菜快得多。

import pickle
##make Pickle File
with open('pickles/corups.pickle', 'wb') as handle:
    pickle.dump(corpus, handle)

#read pickle file
with open('pickles/corups.pickle', 'rb') as handle:
    corpus = pickle.load(handle)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM