简体   繁体   中英

Class for pickle- and copy-persistent object?

I'm trying to write a class for a read-only object which will not be really copied with the copy module, and when it will be pickled to be transferred between processes each process will maintain no more than one copy of it, no matter how many times it will be passed around as a "new" object. Is there already something like that?

I made an attempt to implement this. @Alex Martelli and anyone else, please give me comments/improvements. I think this will eventually end up on GitHub.

"""
todo: need to lock library to avoid thread trouble?

todo: need to raise an exception if we're getting pickled with
an old protocol?

todo: make it polite to other classes that use __new__. Therefore, should
probably work not only when there is only one item in the *args passed to new.

"""

import uuid
import weakref

library = weakref.WeakValueDictionary()

class UuidToken(object):
    def __init__(self, uuid):
        self.uuid = uuid


class PersistentReadOnlyObject(object):
    def __new__(cls, *args, **kwargs):
        if len(args)==1 and len(kwargs)==0 and isinstance(args[0], UuidToken):
            received_uuid = args[0].uuid
        else:
            received_uuid = None

        if received_uuid:
            # This section is for when we are called at unpickling time
            thing = library.pop(received_uuid, None)
            if thing:
                thing._PersistentReadOnlyObject__skip_setstate = True
                return thing
            else: # This object does not exist in our library yet; Let's add it
                new_args = args[1:]
                thing = super(PersistentReadOnlyObject, cls).__new__(cls,
                                                                     *new_args,
                                                                     **kwargs)
                thing._PersistentReadOnlyObject__uuid = received_uuid
                library[received_uuid] = thing
                return thing

        else:
            # This section is for when we are called at normal creation time
            thing = super(PersistentReadOnlyObject, cls).__new__(cls, *args,
                                                                 **kwargs)
            new_uuid = uuid.uuid4()
            thing._PersistentReadOnlyObject__uuid = new_uuid
            library[new_uuid] = thing
            return thing

    def __getstate__(self):
        my_dict = dict(self.__dict__)
        del my_dict["_PersistentReadOnlyObject__uuid"]
        return my_dict

    def __getnewargs__(self):
        return (UuidToken(self._PersistentReadOnlyObject__uuid),)

    def __setstate__(self, state):
        if self.__dict__.pop("_PersistentReadOnlyObject__skip_setstate", None):
            return
        else:
            self.__dict__.update(state)

    def __deepcopy__(self, memo):
        return self

    def __copy__(self):
        return self

# --------------------------------------------------------------
"""
From here on it's just testing stuff; will be moved to another file.
"""


def play_around(queue, thing):
    import copy
    queue.put((thing, copy.deepcopy(thing),))

class Booboo(PersistentReadOnlyObject):
    def __init__(self):
        self.number = random.random()

if __name__ == "__main__":

    import multiprocessing
    import random
    import copy

    def same(a, b):
        return (a is b) and (a == b) and (id(a) == id(b)) and \
               (a.number == b.number)

    a = Booboo()
    b = copy.copy(a)
    c = copy.deepcopy(a)
    assert same(a, b) and same(b, c)

    my_queue = multiprocessing.Queue()
    process = multiprocessing.Process(target = play_around,
                                      args=(my_queue, a,))
    process.start()
    process.join()
    things = my_queue.get()
    for thing in things:
        assert same(thing, a) and same(thing, b) and same(thing, c)
    print("all cool!")

I don't know of any such functionality already implemented. The interesting problem is as follows, and needs precise specs as to what's to happen in this case...:

  • process A makes the obj and sends it to B which unpickles it, so far so good
  • A makes change X to the obj, meanwhile B makes change Y to ITS copy of the obj
  • now either process sends its obj to the other, which unpickles it: what changes to the object need to be visible at this time in each process ? does it matter whether A's sending to B or vice versa, ie does A "own" the object? or what?

If you don't care, say because only A OWNS the obj -- only A is ever allowed to make changes and send the obj to others, others can't and won't change -- then the problems boil down to identifying obj uniquely -- a GUID will do. The class can maintain a class attribute dict mapping GUIDs to existing instances (probably as a weak-value dict to avoid keeping instances needlessly alive, but that's a side issue) and ensure the existing instance is returned when appropriate.

But if changes need to be synchronized to any finer granularity, then suddenly it's a REALLY difficult problem of distributed computing and the specs of what happens in what cases really need to be nailed down with the utmost care (and more paranoia than is present in most of us -- distributed programming is VERY tricky unless a few simple and provably correct patterns and idioms are followed fanatically!-).

If you can nail down the specs for us, I can offer a sketch of how I would go about trying to meet them. But I won't presume to guess the specs on your behalf;-).

Edit : the OP has clarified, and it seems all he needs is a better understanding of how to control __new__ . That's easy: see __getnewargs__ -- you'll need a new-style class and pickling with protocol 2 or better (but those are advisable anyway for other reasons!-), then __getnewargs__ in an existing object can simply return the object's GUID (which __new__ must receive as an optional parameter). So __new__ can check if the GUID is present in the class's memo [[weakvalue;-)]]dict (and if so return the corresponding object value) -- if not (or if the GUID is not passed, implying it's not an unpickling, so a fresh GUID must be generated), then make a truly-new object (setting its GUID;-) and also record it in the class-level memo .

BTW, to make GUIDs, consider using the uuid module in the standard library.

you could use simply a dictionnary with the key and the values the same in the receiver. And to avoid a memory leak use a WeakKeyDictionary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM