简体   繁体   中英

Pickling a “interned” object

Let's say I have a class called Symbol. At any given time, I want one and only one copy of a Symbol with a given id. For example

registry = {}

class Symbol(object):
    def __init__(self, id):
       self.id = id
    def __eq__(self, other):
       return self is other

def symbol(id):
    if id not in registry:
        registry[id] = Symbol(id)

    return registry[id]

I'd like to be able to pickle my Symbol object, but I can't figure how to get cPickle call my symbol factory. Now I could just implement the getstate/setstate overrides, but that would still not merge unpickled objects with the ones already existing in the registry. How to pickle the above class while preserving the 1:1 ratio of Symbols to IDs?


Edit (updated title to state "interned" instead of "singleton"):

Let me explain the use case. We're using these Symbols as keys in dicts. Having them be interned drastically improves performance

What I need to have happen:

x = symbol("x")

y = pickle.loads(pickle.dumps(x))

x is y == True

Since you don't want more than one object with a given id, provide a custom __new__ method in place of your symbol function.

class Symbol(object):
    registry = {}
    def __new__(cls, *args, **kwargs):
        id_ = args[0]
        return Symbol.registry.setdefault(_id, object.__new__(cls, *args, **kwargs))

    def __init__(self, id):
       self.id = id

Now you don't need a factory function to create Symbol objects.

$ a = Symbol('=')
$ b = Symbol('=')
$ a is b
True

You may want to use weakref 's WeakValueDictionary for the registry of symbols, so that garbage collection can reclaim the memory when the Symbol are not referenced anymore.

You could use the following class to define what an interned object is. Your Symbol class (or any other class), can then inherit from it.

class Interned (object):
    # you need to create this registry in each class if the keys are not unique system-wide
    registry = weakref.WeakValueDictionary() 
    def __new__(cls, *args, **kwargs):
        assert 0 < len(args)
        if not args[0] in cls.registry: # don't use setdefault to avoid creating unnecessary objects
            o = object.__new__(cls, *args, **kwargs) # o is a ref needed to avoid garbage collection within this call
            cls.registry[args[0]] = o
            return o
        return cls.registry[args[0]]
    def __eq__(self, other):
        return self is other
    def __hash__(self): 
        # needed by python 3
        return id(self)
    def __ne__(self, other):
        return not self is other

Your code becomes :

class Symbol(Interned):
    def __init__(self, id):
       self.id = id

Resulting in:

$ a = Symbol('=')
$ b = Symbol('=')
$ a is b
True

You can try so subclass pickle.Unpickler and implement your loading logic in the load method.

But you will need some kind of key to know if the object already exists at runtime (to return a reference rather than a new instance). This will lead you to the reimplementation of the python object space.

I would recommand trying to find another data structure more suited to your actual problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM