简体   繁体   English

cPickle - 不同的结果腌制同一个对象

[英]cPickle - different results pickling the same object

Is anyone able to explain the comment under testLookups() in this code snippet ? 有人能够在此代码段中解释testLookups()下的评论吗?

I've run the code and indeed what the comment sais is true. 我运行代码,确实评论sais是真的。 However I'd like to understand why it's true, ie why is cPickle outputting different values for the same object depending on how it is referenced. 但是我想理解为什么它是真的,即为什么cPickle根据它的引用方式为同一个对象输出不同的值。

Does it have anything to do with reference count? 它与引用计数有什么关系吗? If so, isn't that some kind of a bug - ie the pickled and deserialized object would have an abnormally high reference count and in effect would never get garbage collected? 如果是这样,那不是某种错误 - 即腌制和反序列化的对象会有异常高的引用计数,实际上永远不会收集垃圾?

There is no guarantee that seemingly identical objects will produce identical pickle strings. 不能保证看似相同的物体会产生相同的泡菜串。

The pickle protocol is a virtual machine, and a pickle string is a program for that virtual machine. pickle协议是虚拟机,pickle字符串是该虚拟机的程序。 For a given object there exist multiple pickle strings (=programs) that will reconstruct that object exactly. 对于给定的对象,存在多个将完全重建该对象的pickle字符串(= program)。

To take one of your examples: 举一个你的例子:

>>> from cPickle import dumps
>>> t = ({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5])
>>> dumps(({1: 1, 2: 4, 3: 6, 4: 8, 5: 10}, 'Hello World', (1, 2, 3, 4, 5), [1, 2, 3, 4, 5]))
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\np2\n(I1\nI2\nI3\nI4\nI5\ntp3\n(lp4\nI1\naI2\naI3\naI4\naI5\nat."
>>> dumps(t)
"((dp1\nI1\nI1\nsI2\nI4\nsI3\nI6\nsI4\nI8\nsI5\nI10\nsS'Hello World'\n(I1\nI2\nI3\nI4\nI5\nt(lp2\nI1\naI2\naI3\naI4\naI5\natp3\n."

The two pickle strings differ in their use of the p opcode. 两个pickle字符串在使用p操作码方面有所不同。 The opcode takes one integer argument and its function is as follows: 操作码采用一个整数参数,其功能如下:

  name='PUT'    code='p'   arg=decimalnl_short

  Store the stack top into the memo.  The stack is not popped.

  The index of the memo location to write into is given by the newline-
  terminated decimal string following.  BINPUT and LONG_BINPUT are
  space-optimized versions.

To cut a long story short, the two pickle strings are basically equivalent. 长话短说,两个泡菜串基本相同。

I haven't tried to nail down the exact cause of the differences in generated opcodes. 我没有试图确定生成的操作码的差异的确切原因。 This could well have to do with reference counts of the objects being serialized. 这很可能与被序列化对象的引用计数有关。 What is clear, however, that discrepancies like this will have no effect on the reconstructed object. 然而,很明显,像这样的差异对重建的物体没有影响。

It is looking at the reference counts, from the cPickle source: 它正在查看来自cPickle源的引用计数:

if (Py_REFCNT(args) > 1) {
    if (!( py_ob_id = PyLong_FromVoidPtr(args)))
        goto finally;

    if (PyDict_GetItem(self->memo, py_ob_id)) {
        if (get(self, py_ob_id) < 0)
            goto finally;

        res = 0;
        goto finally;
    }
}

The pickle protocol has to deal with pickling multiple references to the same object. pickle协议必须处理对同一对象的多个引用的pickle。 In order to prevent duplicating the object when depickled it uses a memo. 为了防止在depickled时复制对象,它使用备忘录。 The memo basically maps indexes to the various objects. 备忘录基本上将索引映射到各种对象。 The PUT (p) opcode in the pickle stores the current object in this memo dictionary. pickle中的PUT(p)操作码将当前对象存储在此备忘录字典中。

However, if there is only a single reference to an object, there is no reason to store it it the memo because it is impossible to need to reference it again because it only has one reference. 但是,如果只有一个对象的引用,则没有理由将它存储在备忘录中,因为它不可能再次引用它,因为它只有一个引用。 Thus the cPickle code checks the reference count for a little optimization at this point. 因此,cPickle代码在此时检查引用计数以进行一点优化。

So yes, its the reference counts. 所以是的,它的引用很重要。 But not that's not a problem. 但不是那不是问题。 The objects unpickled will have the correct reference counts, it just produces a slightly shorter pickle when the reference counts are at 1. 未打开的对象将具有正确的引用计数,当引用计数为1时,它只会产生稍短的pickle。

Now, I don't know what you are you doing that you care about this. 现在,我不知道你在做什么,你关心这个。 But you really shouldn't assume that pickling the same object will always give you the same result. 但是你真的不应该认为酸洗相同的物体总会给你相同的结果。 If nothing else, I'd expect dictionaries to give you problems because the order of the keys is undefined. 如果没有别的,我希望字典能给你带来问题,因为键的顺序是不确定的。 Unless you have python documentation that guarantees the pickle is the same each time I highly recommend you don't depend on it. 除非你有python文档保证pickle是相同的每次我强烈建议你不依赖它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM