简体   繁体   中英

How to write __getitem__ cleanly?

In Python, when implementing a sequence type, I often (relatively speaking) find myself writing code like this:

class FooSequence(collections.abc.Sequence):
    # Snip other methods

    def __getitem__(self, key):
        if isinstance(key, int):
            # Get a single item
        elif isinstance(key, slice):
            # Get a whole slice
        else:
            raise TypeError('Index must be int, not {}'.format(type(key).__name__))

The code checks the type of its argument explicitly with isinstance() . This is regarded as an antipattern within the Python community. How do I avoid it?

  • I cannot use functools.singledispatch , because that's quite deliberately incompatible with methods (it will attempt to dispatch on self , which is entirely useless since we're already dispatching on self via OOP polymorphism). It works with @staticmethod , but what if I need to get stuff out of self ?
  • Casting to int() and then catching the TypeError , checking for a slice, and possibly re-raising is still ugly, though perhaps slightly less so.
  • It might be cleaner to convert integers into one-element slices and handle both situations with the same code, but that has its own problems (return 0 or [0] ?).

As much as it seems odd, I suspect that the way you have it is the best way to go about things. Patterns generally exist to encompass common use cases, but that doesn't mean that they should be taken as gospel when following them makes life more difficult. The main reason that PEP 443 gives for balking at explicit typechecking is that it is "brittle and closed to extension". However, that mainly applies to custom functions that take a number of different types at any time. From the Python docs on __getitem__ :

For sequence types, the accepted keys should be integers and slice objects. Note that the special interpretation of negative indexes (if the class wishes to emulate a sequence type) is up to the __getitem__() method. If key is of an inappropriate type, TypeError may be raised; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), IndexError should be raised. For mapping types, if key is missing (not in the container), KeyError should be raised.

The Python documentation explicitly states the two types that should be accepted, and what to do if an item that is not of those two types is provided. Given that the types are provided by the documentation itself, it's unlikely to change (doing so would break far more implementations than just yours), so it's likely not worth the trouble to go out of your way to code against Python itself potentially changing.

If you're set on avoiding explicit typechecking, I would point you toward this SO answer . It contains a concise implementation of a @methdispatch decorator (not my name, but i'll roll with it) that lets @singledispatch work with methods by forcing it to check args[1] (arg) rather than args[0] (self). Using that should allow you to use custom single dispatch with your __getitem__ method.

Whether or not you consider either of these "pythonic" is up to you, but remember that while The Zen of Python notes that "Special cases aren't special enough to break the rules", it then immediately notes that "practicality beats purity". In this case, just checking for the two types that the documentation explicitly states are the only things __getitem__ should support seems like the practical way to me.

I'm not aware of a way to avoid doing it once . That's just the tradeoff of using a dynamically-typed language in this way. However, that doesn't mean you have to do it over and over again. I would solve it once by creating an abstract class with split out method names, then inherit from that class instead of directly from Sequence , like:

class UnannoyingSequence(collections.abc.Sequence):

    def __getitem__(self, key):
        if isinstance(key, int):
            return self.getitem(key)
        elif isinstance(key, slice):
            return self.getslice(key)
        else:
            raise TypeError('Index must be int, not {}'.format(type(key).__name__))

    # default implementation in terms of getitem
    def getslice(self, key):
        # Get a whole slice

class FooSequence(UnannoyingSequence):
    def getitem(self, key):
        # Get a single item

    # optional efficient, type-specific implementation not in terms of getitem
    def getslice(self, key):
        # Get a whole slice

This cleans up FooSequence enough that I might even do it this way if I only had the one derived class. I'm sort of surprised the standard library doesn't already work that way.

The antipattern is for normal user code to do type checking, especially by using the type() function 1 .

When mucking about with internals 2 type checks can be necessary, and isinstance() is the preferred way.

In other words, your code is perfectly Pythonic, and its only problem is the error message (it doesn't mention slice s).


Disclosure: I am a Python core developer.


1 When absolutely needed, isinstance() is the better choice.

2 Especially methods such as __getitem__

To stay pythonic, you have work with the semantics rather than the type of the objects. So if you have some parameter as accessor to a sequence, just use it like that. Use the abstraction for a parameter as long as possible. If you expect a set of user identifiers, do not expect a set, but rather some data structure with a method add . If you expect some text, do not expect a unicode object, but rather some container for characters featuring encode and decode methods.

I assume in general you want to do something like "Use the behavior of the base implementation unless some special value is provided. If you want to implement __getitem__ , you can use a case distinction where something different happens if one special value is provided. I'd use the following pattern:

class FooSequence(collections.abc.Sequence):
    # Snip other methods

    def __getitem__(self, key):
        try:
            if key == SPECIAL_VALUE:
                return SOMETHING_SPECIAL
            else:
                return self.our_baseclass_instance[key]
        except AttributeError:
            raise TypeError('Wrong type: {}'.format(type(key).__name__))

If you want to distinguish between a single value (in perl terminology "scalar") and a sequence (in Java terminology "collection"), then it is pythonically fine to determine whether an iterator is implemented. You can either use a try-catch pattern or hasattr as I do now:

>>> a = 42
>>> b = [1, 3, 5, 7]
>>> c = slice(1, 42)
>>> hasattr(a, "__iter__")
False
>>> hasattr(b, "__iter__")
True
>>> hasattr(c, "__iter__")
False
>>>

Applied to our example:

class FooSequence(collections.abc.Sequence):
    # Snip other methods

    def __getitem__(self, key):
        try:
            if hasattr(key, "__iter__"):
                return map(lambda x: WHATEVER(x), key)
            else:
                return self.our_baseclass_instance[key]
        except AttributeError:
            raise TypeError('Wrong type: {}'.format(type(key).__name__))

Dynamic programming languages like python and ruby use duck typing. And a duck is an animal, that walks like a duck, swims like a duck and quacks like a duck. Not because somebody calls it a "duck".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM