简体   繁体   English

如何干净地写__getitem__?

[英]How to write __getitem__ cleanly?

In Python, when implementing a sequence type, I often (relatively speaking) find myself writing code like this: 在Python中,当实现序列类型时,我经常(相对而言)发现自己编写这样的代码:

class FooSequence(collections.abc.Sequence):
    # Snip other methods

    def __getitem__(self, key):
        if isinstance(key, int):
            # Get a single item
        elif isinstance(key, slice):
            # Get a whole slice
        else:
            raise TypeError('Index must be int, not {}'.format(type(key).__name__))

The code checks the type of its argument explicitly with isinstance() . 代码使用isinstance()显式检查其参数的类型。 This is regarded as an antipattern within the Python community. 这被认为是 Python社区中的反模式 How do I avoid it? 我该如何避免呢?

  • I cannot use functools.singledispatch , because that's quite deliberately incompatible with methods (it will attempt to dispatch on self , which is entirely useless since we're already dispatching on self via OOP polymorphism). 我不能使用functools.singledispatch ,因为这非常故意与方法不兼容(它将尝试在self上发送,这完全没用,因为我们已经通过OOP多态性调度self )。 It works with @staticmethod , but what if I need to get stuff out of self ? 它的工作原理与@staticmethod ,但如果我有什么需要得到的东西出来的self
  • Casting to int() and then catching the TypeError , checking for a slice, and possibly re-raising is still ugly, though perhaps slightly less so. 转换为int()然后捕获TypeError ,检查切片,并且可能重新提升仍然很难看,尽管可能稍微不那么重要。
  • It might be cleaner to convert integers into one-element slices and handle both situations with the same code, but that has its own problems (return 0 or [0] ?). 将整数转换为单元素切片并使用相同的代码处理这两种情况可能更清晰,但这有其自身的问题(返回0[0] ?)。

As much as it seems odd, I suspect that the way you have it is the best way to go about things. 尽管看起来很奇怪,但我怀疑你拥有它的方式是最好的方法。 Patterns generally exist to encompass common use cases, but that doesn't mean that they should be taken as gospel when following them makes life more difficult. 模式通常存在以包含常见的用例,但这并不意味着在遵循它们时应将它们视为福音,这会使生活变得更加困难。 The main reason that PEP 443 gives for balking at explicit typechecking is that it is "brittle and closed to extension". PEP 443在明确的类型检查中给出的主要原因是它“脆弱且不能延伸”。 However, that mainly applies to custom functions that take a number of different types at any time. 但是,这主要适用于随时采用多种不同类型的自定义函数。 From the Python docs on __getitem__ : 来自__getitem__Python文档

For sequence types, the accepted keys should be integers and slice objects. 对于序列类型,接受的键应该是整数和切片对象。 Note that the special interpretation of negative indexes (if the class wishes to emulate a sequence type) is up to the __getitem__() method. 请注意,负索引的特殊解释(如果类希望模拟序列类型)取决于__getitem __()方法。 If key is of an inappropriate type, TypeError may be raised; 如果key是不合适的类型,则可能引发TypeError; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised. 如果序列的索引集之外的值(在对负值进行任何特殊解释之后),则应引发IndexError。 For mapping types, if key is missing (not in the container), KeyError should be raised. 对于映射类型,如果缺少键(不在容器中),则应引发KeyError。

The Python documentation explicitly states the two types that should be accepted, and what to do if an item that is not of those two types is provided. Python文档明确说明了应该接受的两种类型,以及如果提供了不属于这两种类型的项目该怎么办。 Given that the types are provided by the documentation itself, it's unlikely to change (doing so would break far more implementations than just yours), so it's likely not worth the trouble to go out of your way to code against Python itself potentially changing. 鉴于这些类型是由文档本身提供的,它不太可能改变(这样做会破坏更多的实现而不仅仅是你的实现),因此,对于可能会改变的Python本身来说,编写代码可能并不值得。

If you're set on avoiding explicit typechecking, I would point you toward this SO answer . 如果您打算避免明确的类型检查,我会指出您的SO答案 It contains a concise implementation of a @methdispatch decorator (not my name, but i'll roll with it) that lets @singledispatch work with methods by forcing it to check args[1] (arg) rather than args[0] (self). 它包含一个@methdispatch装饰器的简洁实现(不是我的名字,但我会用它滚动),它允许@singledispatch使用方法强制它检查args[1] (arg)而不是args[0] (self )。 Using that should allow you to use custom single dispatch with your __getitem__ method. 使用它应该允许您使用__getitem__方法使用自定义单一调度。

Whether or not you consider either of these "pythonic" is up to you, but remember that while The Zen of Python notes that "Special cases aren't special enough to break the rules", it then immediately notes that "practicality beats purity". 你是否认为这些“pythonic”中的任何一个都取决于你,但请记住,虽然Python的Zen指出“特殊情况不足以破坏规则”,但它立即注意到“实用性超越纯度” 。 In this case, just checking for the two types that the documentation explicitly states are the only things __getitem__ should support seems like the practical way to me. 在这种情况下,只检查文档明确指出的两种类型是__getitem__应该支持的唯一事情对我来说似乎是实用的方法。

I'm not aware of a way to avoid doing it once . 我不知道有办法避免这样做一次 That's just the tradeoff of using a dynamically-typed language in this way. 这只是以这种方式使用动态类型语言的权衡。 However, that doesn't mean you have to do it over and over again. 但是,这并不意味着你必须一遍又一遍地做。 I would solve it once by creating an abstract class with split out method names, then inherit from that class instead of directly from Sequence , like: 我会通过创建一个带有拆分方法名称的抽象类来解决它,然后从该类继承,而不是直接从Sequence继承,如:

class UnannoyingSequence(collections.abc.Sequence):

    def __getitem__(self, key):
        if isinstance(key, int):
            return self.getitem(key)
        elif isinstance(key, slice):
            return self.getslice(key)
        else:
            raise TypeError('Index must be int, not {}'.format(type(key).__name__))

    # default implementation in terms of getitem
    def getslice(self, key):
        # Get a whole slice

class FooSequence(UnannoyingSequence):
    def getitem(self, key):
        # Get a single item

    # optional efficient, type-specific implementation not in terms of getitem
    def getslice(self, key):
        # Get a whole slice

This cleans up FooSequence enough that I might even do it this way if I only had the one derived class. 这足以清理FooSequence ,如果我只有一个派生类,我甚至可以这样做。 I'm sort of surprised the standard library doesn't already work that way. 标准库尚未以这种方式工作,我感到很惊讶。

The antipattern is for normal user code to do type checking, especially by using the type() function 1 . 反模式用于普通用户代码进行类型检查,尤其是使用type()函数1

When mucking about with internals 2 type checks can be necessary, and isinstance() is the preferred way. 当与内部进行isinstance()时,可能需要进行2种类型的检查,并且isinstance()是首选方法。

In other words, your code is perfectly Pythonic, and its only problem is the error message (it doesn't mention slice s). 换句话说,你的代码完全是Pythonic,它唯一的问题是错误信息(它没有提到slice )。


Disclosure: I am a Python core developer. 披露:我是Python核心开发人员。


1 When absolutely needed, isinstance() is the better choice. 1当绝对需要时, isinstance()是更好的选择。

2 Especially methods such as __getitem__ 2特别是__getitem__等方法

To stay pythonic, you have work with the semantics rather than the type of the objects. 为了保持pythonic,你可以使用语义而不是对象的类型。 So if you have some parameter as accessor to a sequence, just use it like that. 因此,如果您有一些参数作为序列的访问者,那就这样使用它。 Use the abstraction for a parameter as long as possible. 尽可能长时间地使用抽象参数。 If you expect a set of user identifiers, do not expect a set, but rather some data structure with a method add . 如果您期望一组用户标识符,请不要指望一个集合,而是一些带有方法add数据结构。 If you expect some text, do not expect a unicode object, but rather some container for characters featuring encode and decode methods. 如果你期望一些文本,不要指望一个unicode对象,而是一些带有encodedecode方法的字符的容器。

I assume in general you want to do something like "Use the behavior of the base implementation unless some special value is provided. If you want to implement __getitem__ , you can use a case distinction where something different happens if one special value is provided. I'd use the following pattern: 我假设一般你想做一些像“使用基本实现的行为,除非提供一些特殊值。如果你想实现__getitem__ ,你可以使用一个区分区别,如果提供一个特殊值,会发生不同的事情。我使用以下模式:

class FooSequence(collections.abc.Sequence):
    # Snip other methods

    def __getitem__(self, key):
        try:
            if key == SPECIAL_VALUE:
                return SOMETHING_SPECIAL
            else:
                return self.our_baseclass_instance[key]
        except AttributeError:
            raise TypeError('Wrong type: {}'.format(type(key).__name__))

If you want to distinguish between a single value (in perl terminology "scalar") and a sequence (in Java terminology "collection"), then it is pythonically fine to determine whether an iterator is implemented. 如果要区分单个值(在perl术语“标量”中)和序列(在Java术语“集合”中),那么确定是否实现了迭代器是很好的。 You can either use a try-catch pattern or hasattr as I do now: 您可以像我现在一样使用try-catch模式或hasattr

>>> a = 42
>>> b = [1, 3, 5, 7]
>>> c = slice(1, 42)
>>> hasattr(a, "__iter__")
False
>>> hasattr(b, "__iter__")
True
>>> hasattr(c, "__iter__")
False
>>>

Applied to our example: 适用于我们的例子:

class FooSequence(collections.abc.Sequence):
    # Snip other methods

    def __getitem__(self, key):
        try:
            if hasattr(key, "__iter__"):
                return map(lambda x: WHATEVER(x), key)
            else:
                return self.our_baseclass_instance[key]
        except AttributeError:
            raise TypeError('Wrong type: {}'.format(type(key).__name__))

Dynamic programming languages like python and ruby use duck typing. 像python和ruby这样的动态编程语言使用duck typing。 And a duck is an animal, that walks like a duck, swims like a duck and quacks like a duck. 鸭子是一种动物,像鸭子一样走路,像鸭子一样游动,像鸭子一样呱呱叫。 Not because somebody calls it a "duck". 不是因为有人称之为“鸭子”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM