简体   繁体   English

自动使类可哈希化

[英]Automatically making a class hashable

There are several standard ways to make a class hashable, for example (borrowing from SO ): 有几种使类可哈希化的标准方法,例如(从SO借用):

# assume X has 2 attributes: attr_a and attr_b
class X:
  def __key(self):
    return (self.attr_a, self.attr_b)

  def __eq__(x, y):
    return isinstance(y, x.__class__) and x.__key() == y.__key()

  def __hash__(self):
    return hash(self.__key())

Now suppose I have many classes that I want to make hashable. 现在,假设我有许多要使其可哈希化的类。 They are all immutable, with immutable attributes, and hashing all these attributes in bulk is acceptable (for a class with too many attributes, we would only want to hash a few attributes that are enough to avoid most collisions). 它们都是不可变的,具有不可变的属性,并且散列地散列所有这些属性是可以接受的(对于具有太多属性的类,我们只希望散列一些足以避免大多数冲突的属性)。 Can I avoid writing __key() method by hand for every class? 我可以避免为每个类手动编写__key()方法吗?

Would it be a good idea to make a base class that defines __key() , __eq__ , and __hash__ for them? 难道是一个好主意,做一个基类,定义__key() __eq____hash__又在哪里? In particular, I'm not sure whether finding all the instance attributes that should go into __hash__ is doable. 特别是,我不确定是否找到应该放入__hash__所有实例属性是否可行。 I know this is generally impossible , but in this case we can assume more about the object (eg, it's immutable - after __init__ is finished, its attributes are all hashable, etc.). 我知道通常这是不可能的 ,但是在这种情况下,我们可以对对象做更多的假设(例如,它是不可变的__init__完成后,其属性都是可哈希的,等等)。

(If the inheritance hierarchy won't work, perhaps a decorator would?) (如果继承层次结构不起作用,也许装饰器会起作用?)

Instances store their attributes in self.__dict__ : 实例将其属性存储在self.__dict__

>>> class Foo(object):
...     def __init__(self, foo='bar', spam='eggs'):
...         self.foo = foo
...         self.spam = spam
... 
>>> f = Foo()
>>> f.__dict__
{'foo': 'bar', 'spam': 'eggs'}

Provided you don't store any methods on your instances, a default .__key() could be: 如果您没有在实例上存储任何方法,则默认的.__key()可以是:

def __key(self):
    return tuple(v for k, v in sorted(self.__dict__.items()))

where we sort the items by attribute name; 我们根据属性名称对项目进行排序; the tuple() call ensures we return an immutable sequence suitable for the hash() call. tuple()调用可确保我们返回适合hash()调用的不可变序列。

For more complex setups you'd have to either test for the types returned by values() (skip functions and such) or use a specific pattern of attributes or repurpose __slots__ to list the appropriate attributes you can use. 对于更复杂的设置,您必须测试values()返回的类型(跳过函数等),或者使用特定的属性模式或重新使用__slots__列出可以使用的适当属性。

Together with your __hash__ and __eq__ methods, that'd make a good base class to inherit from for all your immutable classes. 与您的__hash____eq__方法一起,这将成为继承您所有不可变类的良好基类。

You can do it if you assume conventions for your attributes. 如果假定属性约定,则可以执行此操作。 In your example, it would be quite trivial, as your attributes starts with "attr_" - so you coult write your __key method as : 在您的示例中,这很简单,因为您的属性以“ attr_”开头-因此您将__key方法写为:

def __key(self):
    return tuple (getattr(self, attr) for attr in self.__dict__ if attr.startswith("attr_") )

As you can see - any test you can find to put on the filter condition of the generator expression would fit your needs. 如您所见-您可以发现对生成器表达式的过滤条件进行的任何测试都可以满足您的需求。

A suggestion I can give you is to have your classes using Python's __slots__ feature: that not only will make your attribute names easy to find, as will make your imutable objects more efficient to use and with smaller memory footprint. 我可以给您的建议是使用Python的__slots__功能来创建类:这不仅使您的属性名称易于查找,还使您的可植入对象使用起来更有效,并且占用的内存更少。

class X:
    __slots__ = ("a", "b", "c")
    def __key(self):
        return tuple (getattr(self, attr) for attr in self.__class__.__slots__ )

edit Answering the first comment from the OP: 编辑回答OP中的第一条评论:

This works with inheritance, of course. 当然,这可以继承。 If you will always use all of the object's attributes for they, you don't need the "if" part of the expression - write the function as _key (instead of __key which makes a unique name for each class internally) on a class on the top of your hierarchy, and it will work for all your classes. 如果您将始终使用对象的所有属性,则不需要表达式的“ if”部分-在上的类上,将函数写为_key (而不是内部为每个类创建唯一名称的__key )。层次结构的顶部,它将适用于所有类。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM