简体   繁体   English

腌制“被拘留”的物体

[英]Pickling a “interned” object

Let's say I have a class called Symbol. 假设我有一个叫做Symbol的类。 At any given time, I want one and only one copy of a Symbol with a given id. 在任何给定时间,我只需要一个给定id的Symbol的一个副本。 For example 例如

registry = {}

class Symbol(object):
    def __init__(self, id):
       self.id = id
    def __eq__(self, other):
       return self is other

def symbol(id):
    if id not in registry:
        registry[id] = Symbol(id)

    return registry[id]

I'd like to be able to pickle my Symbol object, but I can't figure how to get cPickle call my symbol factory. 我希望能够腌制我的Symbol对象,但是我不知道如何让cPickle调用我的Symbol工厂。 Now I could just implement the getstate/setstate overrides, but that would still not merge unpickled objects with the ones already existing in the registry. 现在,我可以实现getstate / setstate重写,但是仍然不能将未选择的对象与注册表中已经存在的对象合并。 How to pickle the above class while preserving the 1:1 ratio of Symbols to IDs? 如何在保留符号与ID的1:1比例的同时使上述类腌制?


Edit (updated title to state "interned" instead of "singleton"): 编辑(标题更新为“ interned”而不是“ singleton”):

Let me explain the use case. 让我解释一下用例。 We're using these Symbols as keys in dicts. 我们将这些符号用作字典中的键。 Having them be interned drastically improves performance 彻底拘留他们可以提高绩效

What I need to have happen: 我需要发生的事情:

x = symbol("x")

y = pickle.loads(pickle.dumps(x))

x is y == True

Since you don't want more than one object with a given id, provide a custom __new__ method in place of your symbol function. 由于您不希望多个对象具有给定的ID,因此请提供自定义__new__方法来代替symbol函数。

class Symbol(object):
    registry = {}
    def __new__(cls, *args, **kwargs):
        id_ = args[0]
        return Symbol.registry.setdefault(_id, object.__new__(cls, *args, **kwargs))

    def __init__(self, id):
       self.id = id

Now you don't need a factory function to create Symbol objects. 现在,您不需要工厂函数即可创建Symbol对象。

$ a = Symbol('=')
$ b = Symbol('=')
$ a is b
True

You may want to use weakref 's WeakValueDictionary for the registry of symbols, so that garbage collection can reclaim the memory when the Symbol are not referenced anymore. 您可能想将weakrefWeakValueDictionary用于符号注册表,以便在不再引用Symbol时,垃圾回收可以回收内存。

You could use the following class to define what an interned object is. 您可以使用以下类来定义什么是实习对象。 Your Symbol class (or any other class), can then inherit from it. 然后,您的Symbol类(或任何其他类)可以从中继承。

class Interned (object):
    # you need to create this registry in each class if the keys are not unique system-wide
    registry = weakref.WeakValueDictionary() 
    def __new__(cls, *args, **kwargs):
        assert 0 < len(args)
        if not args[0] in cls.registry: # don't use setdefault to avoid creating unnecessary objects
            o = object.__new__(cls, *args, **kwargs) # o is a ref needed to avoid garbage collection within this call
            cls.registry[args[0]] = o
            return o
        return cls.registry[args[0]]
    def __eq__(self, other):
        return self is other
    def __hash__(self): 
        # needed by python 3
        return id(self)
    def __ne__(self, other):
        return not self is other

Your code becomes : 您的代码变为:

class Symbol(Interned):
    def __init__(self, id):
       self.id = id

Resulting in: 导致:

$ a = Symbol('=')
$ b = Symbol('=')
$ a is b
True

You can try so subclass pickle.Unpickler and implement your loading logic in the load method. 您可以尝试使用pickle.Unpickler子类,并在load方法中实现您的加载逻辑。

But you will need some kind of key to know if the object already exists at runtime (to return a reference rather than a new instance). 但是您将需要某种键来知道对象在运行时是否已经存在(返回引用而不是新实例)。 This will lead you to the reimplementation of the python object space. 这将导致您重新实现python对象空间。

I would recommand trying to find another data structure more suited to your actual problem. 我建议尝试寻找另一个更适合您实际问题的数据结构。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM