简体   繁体   中英

How to overwrite the dump/load methods in the pickle class - customizing pickling and unpickling - Python

So far, what I've done is this:

import pickle

class MyPickler(pickle.Pickler):
    def __init__(self, file, protocol=None):
        super(MyPickler, self).__init__(file, protocol)

class MyUnpickler(pickle.Unpickler):
    def __init__(self, file):
        super(MyUnpickler, self).__init__(file) 

In my main method, this is mainly what I have

#created object, then... 
pickledObject = 'testing.pickle'
with open(pickledObject,'wb') as f:
    pickle = MyPickler(f)
    pickle.dump(object) #object is the object I want to pickle, created before this

with open(pickledObject, 'r') as pickledFile:
    unpickle = MyUnpickler(pickledFile)
    object2 = unpickle.load()

However, this is giving me the following error when the super method is called: TypeError: must be type, not classobj

How does one overwrite only the two methods, load and dump? The pickle file is under C:\\Python27/lib/pickle.py

EDIT The enum.py file can be found here: http://dpaste.com/780897/

Object details: Object is initialized like this:

object = CellSizeRelation(CellSizeRelation.Values.FIRST)

And CellSizeRelation is a class that uses the Enumeration:

class CellSizeRelation(Option):
    Values = enum.Enum('FIRST',
                       'SECOND')

Before I pickle object, I do this:

print object.Values._values 
print object.value.enumtype 

output

[EnumValue(<enum.Enum object at 0x02E80E50>, 0, 'FIRST'), EnumValue(<enum.Enum object at 0x02E80E50>, 1, 'SECOND')
<enum.Enum object at 0x02E80E50>

After I unpickle and print out the same thing, I get this output :

[EnumValue(<enum.Enum object at 0x02E80E50>, 0, 'FIRST'), EnumValue(<enum.Enum object at 0x02E80E50>, 1, 'SECOND')
<enum.Enum object at 0x02ECF750>

The problem is that the second object address changes; When initialized the first time, the enumtype and _values have the same address. However, after unpickling, they change addresses. This breaks my code when I try to compare two enumValues. If you look in the enumValue class, the compare function tries to do this:

try:
        assert self.enumtype == other.enumtype
        result = cmp(self.index, other.index)

Because the address changes, the assert function fails. I now somehow need to ensure that the address for the enumtype does not change when unpickled. I was thinking of simply getting the value 'FIRST' from the unpickled file, finding out its index, and reinitializing the object with:

def load:
    object = CellSizeRelation(CellSizeRelation.Values[INDEX])
    return object

You want to customize the way object state is pickled and unpickled, not customize the load and unload functionality.

You'll have to study the Pickling and unpickling normal class instances chapter , in your case defining a __getstate__ and __setstate__ method should be enough.

What happens in your case is that there is a class-level attribute with EnumValue instances, which are meant to be constants. But on unpickling, new EnumValue instances are created that are not connected to the class-level attribute anymore.

The EnumValue instances do have an index attribute you can use to capture their state as an integer instead of an instance of EnumValue , which we can use to find the correct constant again when reinstating your instances:

 class CellSizeRelation(Option):
     # skipping your enum definition and __init__ here

     def __getstate__(self):
         # capture what is normally pickled
         state = self.__dict__.copy()
         # replace the `value` key (now an EnumValue instance), with it's index:
         state['value'] = state['value'].index
         # what we return here will be stored in the pickle
         return state

     def __setstate__(self, newstate):
         # re-create the EnumState instance based on the stored index
         newstate['value'] = self.Values[newstate['value']]
         # re-instate our __dict__ state from the pickled state
         self.__dict__.update(newstate)

So, normally, if there is no __getstate__ the instance __dict__ is pickled. We now do return a copy of that __dict__ , but we swapped out the EnumValue instance for it's index (a simple integer). On unpickling, normally the new instance __dict__ is updated with the unpickled __dict__ we captured on pickling, but now that we have a __setstate__ defined, we can swap out the enum index back out for the correct EnumValue again.

EnumValue is depending on id identity between the Enum "enumeration type" objects. This has some advantages and disadvantages.

The main advantage is that two calls to Enum('A', 'B') define different enumeration types. So:

osx = Enum('Jaguar', 'Tiger', 'Leopard')
bigcats = Enum('Jaguar', 'Tiger', 'Leopard')

If you want to be able to distinguish OS X 10.4 from a striped killing machine, this can be useful.

But this also means that when pickle unpickles osx and bigcats , they're not only going to be distinct from each other, they're also going to be distinct from any earlier instances of osx and bigcats . There's really now any around that, once you think about it.

So, your solution can't involve any kind of hacking pickle; it's going to have to involve hacking the enum module.

You're going to need to define a reasonable __cmp__ method for Enum that does what makes sense for you. If you can abandon the osx-vs.-bigcats distinction, that's easy. If you can't, you need some other way (maybe adding an explicit tag name to the enum definition, or an optional-but-otherwise-implicitly-autoincrementing counter?) that handles it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM