简体   繁体   English

Python数据模型和内置函数之间有什么关系?

[英]What is the relationship between the Python data model and built-in functions?

As I read Python answers on Stack Overflow, I continue to see some people telling users to use the data model's special methods or attributes directly. 当我在Stack Overflow上阅读Python答案时,我继续看到一些人告诉用户直接使用数据模型的特殊方法属性

I then see contradicting advice (sometimes from myself) saying not to do that, and instead to use builtin functions and the operators directly. 然后我看到矛盾的建议(有时来自我自己)说不要这样做,而是直接使用内置函数和运算符。

Why is that? 这是为什么? What is the relationship between the special "dunder" methods and attributes of the Python data model and builtin functions ? 特殊的“dunder”方法与Python 数据模型内置函数的属性之间有什么关系?

When am I supposed to use the special names? 我什么时候应该使用这个特殊的名字?

What is the relationship between the Python datamodel and builtin functions? Python数据模型和内置函数之间有什么关系?

  • The builtins and operators use the underlying datamodel methods or attributes. 内置函数和运算符使用基础数据模型方法或属性。
  • The builtins and operators have more elegant behavior and are in general more forward compatible. 内置运算符和运算符具有更优雅的行为,并且通常更向前兼容。
  • The special methods of the datamodel are semantically non-public interfaces. 数据模型的特殊方法是语义上非公共接口。
  • The builtins and language operators are specifically intended to be the user interface for behavior implemented by special methods. 内置函数和语言运算符专门用于通过特殊方法实现的行为的用户界面。

Thus, you should prefer to use the builtin functions and operators where possible over the special methods and attributes of the datamodel. 因此,您应该尽可能使用内置函数和运算符,而不是数据模型的特殊方法和属性。

The semantically internal APIs are more likely to change than the public interfaces. 语义内部API比公共接口更可能发生变化。 While Python doesn't actually consider anything "private" and exposes the internals, that doesn't mean it's a good idea to abuse that access. 虽然Python实际上并没有考虑任何“私有”并暴露内部,但这并不意味着滥用该访问权限是个好主意。 Doing so has the following risks: 这样做有以下风险:

  • You may find you have more breaking changes when upgrading your Python executable or switching to other implementations of Python (like PyPy, IronPython, or Jython, or some other unforeseen implementation.) 升级Python可执行文件或切换到Python的其他实现(如PyPy,IronPython或Jython,或其他一些无法预料的实现)时,您可能会发现有更多重大更改。
  • Your colleagues will likely think poorly of your language skills and conscientiousness, and consider it a code-smell, bringing you and the rest of your code to greater scrutiny. 您的同事可能会认为您的语言技能和责任感很差,并认为这是一种代码嗅觉,使您和其他代码受到更严格的审查。
  • The builtin functions are easy to intercept behavior for. 内置函数很容易拦截行为。 Using special methods directly limits the power of your Python for introspection and debugging. 使用特殊方法直接限制了Python的内省和调试功能。

In depth 深入

The builtin functions and operators invoke the special methods and use the special attributes in the Python datamodel. 内置函数和运算符调用特殊方法并使用Python数据模型中的特殊属性。 They are the readable and maintainable veneer that hides the internals of objects. 它们是可读且可维护的单板,隐藏了对象的内部。 In general, users should use the builtins and operators given in the language as opposed to calling the special methods or using the special attributes directly. 通常,用户应使用语言中给出的内置和运算符,而不是直接调用特殊方法或使用特殊属性。

The builtin functions and operators also can have fallback or more elegant behavior than the more primitive datamodel special methods. 内置函数和运算符也可以具有后备或更优雅的行为,而不是更原始的数据模型特殊方法。 For example: 例如:

  • next(obj, default) allows you to provide a default instead of raising StopIteration when an iterator runs out, while obj.__next__() does not. next(obj, default)允许你提供一个默认值,而不是在迭代器用完时提高StopIteration ,而obj.__next__()则没有。
  • str(obj) fallsback to obj.__repr__() when obj.__str__() isn't available - whereas calling obj.__str__() directly would raise an attribute error. obj.__str__()不可用时str(obj)回退到obj.__repr__() - 而直接调用obj.__str__()会引发属性错误。
  • obj != other fallsback to not obj == other in Python 3 when no __ne__ - calling obj.__ne__(other) would not take advantage of this. obj != other回退到not obj == other在Python 3中没有__ne__ - 调用obj.__ne__(other)不会利用这个。

(Builtin functions can also be easily overshadowed, if necessary or desirable, on a module's global scope or the builtins module, to further customize behavior.) (如果必要或需要,在模块的全局范围或builtins模块上,内置函数也很容易被掩盖,以进一步自定义行为。)

Mapping the builtins and operators to the datamodel 将内置运算符和运算符映射到数据模型

Here is a mapping, with notes, of the builtin functions and operators to the respective special methods and attributes that they use or return - note that the usual rule is that the builtin function usually maps to a special method of the same name, but this is not consistent enough to warrant giving this map below: 下面是内置函数和运算符的映射,它们使用或返回各自的特殊方法和属性 - 请注意,通常的规则是内置函数通常映射到同名的特殊方法,但是这样不足以保证在下面给出这张地图:

builtins/     special methods/
operators  -> datamodel               NOTES (fb == fallback)

repr(obj)     obj.__repr__()          provides fb behavior for str
str(obj)      obj.__str__()           fb to __repr__ if no __str__
bytes(obj)    obj.__bytes__()         Python 3 only
unicode(obj)  obj.__unicode__()       Python 2 only
format(obj)   obj.__format__()        format spec optional.
hash(obj)     obj.__hash__()
bool(obj)     obj.__bool__()          Python 3, fb to __len__
bool(obj)     obj.__nonzero__()       Python 2, fb to __len__
dir(obj)      obj.__dir__()
vars(obj)     obj.__dict__            does not include __slots__
type(obj)     obj.__class__           type actually bypasses __class__ -
                                      overriding __class__ will not affect type
help(obj)     obj.__doc__             help uses more than just __doc__
len(obj)      obj.__len__()           provides fb behavior for bool
iter(obj)     obj.__iter__()          fb to __getitem__ w/ indexes from 0 on
next(obj)     obj.__next__()          Python 3
next(obj)     obj.next()              Python 2
reversed(obj) obj.__reversed__()      fb to __len__ and __getitem__
other in obj  obj.__contains__(other) fb to __iter__ then __getitem__
obj == other  obj.__eq__(other)
obj != other  obj.__ne__(other)       fb to not obj.__eq__(other) in Python 3
obj < other   obj.__lt__(other)       get >, >=, <= with @functools.total_ordering
complex(obj)  obj.__complex__()
int(obj)      obj.__int__()
float(obj)    obj.__float__()
round(obj)    obj.__round__()
abs(obj)      obj.__abs__()

The operator module has length_hint which has a fallback implemented by a respective special method if __len__ is not implemented: operator模块具有length_hint ,如果未实现__len__则具有通过相应特殊方法实现的回退:

length_hint(obj)  obj.__length_hint__() 

Dotted Lookups 虚线查找

Dotted lookups are contextual. 虚线查找是上下文的。 Without special method implementation, first look in class hierarchy for data descriptors (like properties and slots), then in the instance __dict__ (for instance variables), then in the class hierarchy for non-data descriptors (like methods). 在没有特殊方法实现的情况下,首先在类层次结构中查找数据描述符(如属性和槽),然后在实例__dict__ (例如变量)中查找,然后在类层次结构中查找非数据描述符(如方法)。 Special methods implement the following behaviors: 特殊方法实现以下行为:

obj.attr      obj.__getattr__('attr')       provides fb if dotted lookup fails
obj.attr      obj.__getattribute__('attr')  preempts dotted lookup
obj.attr = _  obj.__setattr__('attr', _)    preempts dotted lookup
del obj.attr  obj.__delattr__('attr')       preempts dotted lookup

Descriptors

Descriptors are a bit advanced - feel free to skip these entries and come back later - recall the descriptor instance is in the class hierarchy (like methods, slots, and properties). 描述符有点高级 - 随意跳过这些条目并稍后再回来 - 回想一下描述符实例在类层次结构中(如方法,插槽和属性)。 A data descriptor implements either __set__ or __delete__ : 数据描述符实现__set____delete__

obj.attr        descriptor.__get__(obj, type(obj)) 
obj.attr = val  descriptor.__set__(obj, val)
del obj.attr    descriptor.__delete__(obj)

When the class is instantiated (defined) the following descriptor method __set_name__ is called if any descriptor has it to inform the descriptor of its attribute name. 当实例化(定义)类时,如果任何描述符使它通知描述符其属性名称,则调用以下描述符方法__set_name__ (This is new in Python 3.6.) cls is same as type(obj) above, and 'attr' stands in for the attribute name: (这是Python 3.6中的新功能。) cls与上面的type(obj)相同, 'attr'代表属性名称:

class cls:
    @descriptor_type
    def attr(self): pass # -> descriptor.__set_name__(cls, 'attr') 

Items (subscript notation) 项目(下标符号)

The subscript notation is also contextual: 下标符号也是上下文的:

obj[name]         -> obj.__getitem__(name)
obj[name] = item  -> obj.__setitem__(name, item)
del obj[name]     -> obj.__delitem__(name)

A special case for subclasses of dict , __missing__ is called if __getitem__ doesn't find the key: 如果__getitem__找不到键,则调用dict子类__missing__

obj[name]         -> obj.__missing__(name)  

Operators 运营商

There are also special methods for +, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, | 还有+, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, |特殊方法。 operators, for example: 运营商,例如:

obj + other   ->  obj.__add__(other), fallback to other.__radd__(obj)
obj | other   ->  obj.__or__(other), fallback to other.__ror__(obj)

and in-place operators for augmented assignment, +=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |= , for example: 和增强赋值的就地运算符, +=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |= ,例如:

obj += other  ->  obj.__iadd__(other)
obj |= other  ->  obj.__ior__(other)

and unary operations: 和一元行动:

+obj          ->  obj.__pos__()
-obj          ->  obj.__neg__()
~obj          ->  obj.__invert__()

Context Managers 上下文管理器

A context manager defines __enter__ , which is called on entering the code block (its return value, usually self, is aliased with as ), and __exit__ , which is guaranteed to be called on leaving the code block, with exception information. 上下文管理器定义了__enter__ ,它在进入代码块时调用(其返回值,通常是self,用as别名),而__exit__ ,保证在离开代码块时被调用异常信息。

with obj as cm:     ->  cm = obj.__enter__()
    raise Exception('message')
->  obj.__exit__(Exception, Exception('message'), traceback_object)

If __exit__ gets an exception and then returns a false value, it will reraise it on leaving the method. 如果__exit__获得异常然后返回false值,它将在离开方法时重新加载它。

If no exception, __exit__ gets None for those three arguments instead, and the return value is meaningless: 如果没有异常,则__exit__会为这三个参数获取None ,并且返回值无意义:

with obj:           ->  obj.__enter__()
    pass
->  obj.__exit__(None, None, None)

Some Metaclass Special Methods 一些元类特殊方法

Similarly, classes can have special methods (from their metaclasses) that support abstract base classes: 类似地,类可以有支持抽象基类的特殊方法(来自它们的元类):

isinstance(obj, cls) -> cls.__instancecheck__(obj)
issubclass(sub, cls) -> cls.__subclasscheck__(sub)

An important takeaway is that while the builtins like next and bool do not change between Python 2 and 3, underlying implementation names are changing. 一个重要的问题是,虽然像nextbool这样的内置函数在Python 2和3之间没有变化,但底层实现名称正在发生变化。

Thus using the builtins also offers more forward compatibility. 因此,使用内置也提供了更多的向前兼容性。

When am I supposed to use the special names? 我什么时候应该使用这个特殊的名字?

In Python, names that begin with underscores are semantically non-public names for users. 在Python中,以下划线开头的名称是用户的语义非公共名称。 The underscore is the creator's way of saying, "hands-off, don't touch." 下划线是创作者的说法,“放手,不要触摸”。

This is not just cultural, but it is also in Python's treatment of API's. 这不仅仅是文化,而且也是Python对API的处理。 When a package's __init__.py uses import * to provide an API from a subpackage, if the subpackage does not provide an __all__ , it excludes names that start with underscores. 当包的__init__.py使用import *从子包提供API时,如果子包没有提供__all__ ,它将排除以下划线开头的名称。 The subpackage's __name__ would also be excluded. 子包的__name__也将被排除。

IDE autocompletion tools are mixed in their consideration of names that start with underscores to be non-public. IDE自动完成工具考虑到以下划线开头的非公开名称。 However, I greatly appreciate not seeing __init__ , __new__ , __repr__ , __str__ , __eq__ , etc. (nor any of the user created non-public interfaces) when I type the name of an object and a period. 然而,我非常感谢没有看到__init____new____repr____str____eq__等(也没有任何创建的非公共接口的用户)当我输入一个对象和一个周期的名称。

Thus I assert: 因此我断言:

The special "dunder" methods are not a part of the public interface. 特殊的“dunder”方法不是公共接口的一部分。 Avoid using them directly. 避免直接使用它们。

So when to use them? 那么什么时候使用它们?

The main use-case is when implementing your own custom object or subclass of a builtin object. 主要用例是实现自己的自定义对象或内置对象的子类。

Try to only use them when absolutely necessary. 尽量只在绝对必要时使用它们。 Here are some examples: 这里有些例子:

Use the __name__ special attribute on functions or classes 在函数或类上使用__name__特殊属性

When we decorate a function, we typically get a wrapper function in return that hides helpful information about the function. 当我们修饰一个函数时,我们通常得到一个包装函数作为回报,隐藏有关函数的有用信息。 We would use the @wraps(fn) decorator to make sure we don't lose that information, but if we need the name of the function, we need to use the __name__ attribute directly: 我们将使用@wraps(fn)装饰器来确保我们不会丢失该信息,但如果我们需要函数的名称,我们需要直接使用__name__属性:

from functools import wraps

def decorate(fn): 
    @wraps(fn)
    def decorated(*args, **kwargs):
        print('calling fn,', fn.__name__) # exception to the rule
        return fn(*args, **kwargs)
    return decorated

Similarly, I do the following when I need the name of the object's class in a method (used in, for example, a __repr__ ): 类似地,当我需要方法中的对象类的名称时(例如,在__repr__ ),我会执行以下操作:

def get_class_name(self):
    return type(self).__name__
          # ^          # ^- must use __name__, no builtin e.g. name()
          # use type, not .__class__

Using special attributes to write custom classes or subclassed builtins 使用特殊属性编写自定义类或子类内置函数

When we want to define custom behavior, we must use the data-model names. 当我们想要定义自定义行为时,我们必须使用数据模型名称。

This makes sense, since we are the implementors, these attributes aren't private to us. 这是有道理的,因为我们是实现者,这些属性对我们来说不是私有的。

class Foo(object):
    # required to here to implement == for instances:
    def __eq__(self, other):      
        # but we still use == for the values:
        return self.value == other.value
    # required to here to implement != for instances:
    def __ne__(self, other): # docs recommend for Python 2.
        # use the higher level of abstraction here:
        return not self == other  

However, even in this case, we don't use self.value.__eq__(other.value) or not self.__eq__(other) (see my answer here for proof that the latter can lead to unexpected behavior.) Instead, we should use the higher level of abstraction. 但是,即使在这种情况下,我们也不使用self.value.__eq__(other.value)not self.__eq__(other) (请参阅我的答案,以证明后者可能导致意外行为。)相反,我们应该使用更高级别的抽象。

Another point at which we'd need to use the special method names is when we are in a child's implementation, and want to delegate to the parent. 我们需要使用特殊方法名称的另一点是当我们处于子实现时,并且想要委托给父代。 For example: 例如:

class NoisyFoo(Foo):
    def __eq__(self, other):
        print('checking for equality')
        # required here to call the parent's method
        return super(NoisyFoo, self).__eq__(other) 

Conclusion 结论

The special methods allow users to implement the interface for object internals. 特殊方法允许用户实现对象内部的接口。

Use the builtin functions and operators wherever you can. 尽可能使用内置函数和运算符。 Only use the special methods where there is no documented public API. 仅在没有文档公共API的情况下使用特殊方法。

I'll show some usage that you apparently didn't think of, comment on the examples you showed, and argue against the privacy claim from your own answer. 我将展示您显然没有想到的一些用法,评论您展示的示例,并根据您自己的答案反对隐私声明。


I agree with your own answer that for example len(a) should be used, not a.__len__() . 我同意你自己的答案,例如应该使用len(a) ,而不是a.__len__() I'd put it like this: len exists so we can use it, and __len__ exists so len can use it . 我这样说: len存在所以我们可以使用它,并且__len__存在所以len可以使用它 Or however that really works internally, since len(a) can actually be much faster , at least for example for lists and strings: 或者这确实在内部工作,因为len(a)实际上可以更快 ,至少例如对于列表和字符串:

>>> timeit('len(a)', 'a = [1,2,3]', number=10**8)
4.22549770486512
>>> timeit('a.__len__()', 'a = [1,2,3]', number=10**8)
7.957335462257106

>>> timeit('len(s)', 's = "abc"', number=10**8)
4.1480574509332655
>>> timeit('s.__len__()', 's = "abc"', number=10**8)
8.01780160432645

But besides defining these methods in my own classes for usage by builtin functions and operators, I occasionally also use them as follows: 但是除了在我自己的类中定义这些方法以供内置函数和运算符使用之外,我偶尔也会使用它们如下:

Let's say I need to give a filter function to some function and I want to use a set s as the filter. 假设我需要为某个函数提供过滤函数,并且我想使用集合s作为过滤器。 I'm not going to create an extra function lambda x: x in s or def f(x): return x in s . 我不会lambda x: x in sdef f(x): return x in s创建一个额外的函数lambda x: x in s def f(x): return x in s No. I already have a perfectly fine function that I can use: the set's __contains__ method. 不,我已经拥有了一个可以使用的非常好的功能:set的__contains__方法。 It's simpler and more direct. 它更简单,更直接。 And even faster, as shown here (ignore that I save it as f here, that's just for this timing demo): 甚至更快,如此处所示(忽略我将其保存为f ,这仅适用于此计时演示):

>>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = s.__contains__', number=10**8)
6.473739433621368
>>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = lambda x: x in s', number=10**8)
19.940786514456924
>>> timeit('f(2); f(4)', 's = {1, 2, 3}\ndef f(x): return x in s', number=10**8)
20.445680107760325

So while I don't directly call magic methods like s.__contains__(x) , I do occasionally pass them somewhere like some_function_needing_a_filter(s.__contains__) . 因此,虽然我没有直接调用 s.__contains__(x)s.__contains__(x)魔术方法,但我偶尔会将它们传递some_function_needing_a_filter(s.__contains__) And I think that's perfectly fine, and better than the lambda/def alternative. 我认为这完全没问题,并且比lambda / def替代方案更好。


My thoughts on the examples you showed: 我对你展示的例子的想法:

  • Example 1 : Asked how to get the size of a list, he answered items.__len__() . 示例1 :在询问如何获取列表大小时,他回答了items.__len__() Even without any reasoning. 即使没有任何推理。 My verdict: That's just wrong. 我的判决:这是错的。 Should be len(items) . 应该是len(items)
  • Example 2 : Does mention d[key] = value first! 例2 :首先提到d[key] = value And then adds d.__setitem__(key, value) with the reasoning "if your keyboard is missing the square bracket keys" , which rarely applies and which I doubt was serious. 然后添加d.__setitem__(key, value)和推理“如果你的键盘缺少方括号键” ,这很少适用,我怀疑是严重的。 I think it was just the foot in the door for the last point, mentioning that that's how we can support the square bracket syntax in our own classes. 我认为这只是最后一点的关键所在,提到我们可以在自己的类中支持方括号语法。 Which turns it back to a suggestion to use square brackets. 这使得它回到建议使用方括号。
  • Example 3 : Suggests obj.__dict__ . 例3 :建议obj.__dict__ Bad, like the __len__ example. 不好,就像__len__例子一样。 But I suspect he just didn't know vars(obj) , and I can understand it, as vars is less common/known and the name does differ from the "dict" in __dict__ . 但是我怀疑他只是不知道vars(obj) ,我可以理解它,因为vars不太常见/已知并且名称确实与__dict__的“字典”不同。
  • Example 4 : Suggests __class__ . 例4 :建议__class__ Should be type(obj) . 应该是type(obj) I suspect it's similar to the __dict__ story, although I think type is more well-known. 我怀疑它与__dict__故事类似,尽管我认为type更为人所知。

About privacy: In your own answer you say these methods are "semantically private". 关于隐私:在您自己的回答中,您说这些方法是“语义上的私密”。 I strongly disagree. 我非常不同意。 Single and double leading underscores are for that, but not the data model's special "dunder/magic" methods with double leading+trailing underscores. 单,双下划线开头是这一点,但没有数据模型的特殊的“dunder /神奇”的方法与双带+尾随下划线。

  • The two things you use as arguments are importing behaviour and IDE's autocompletion. 您用作参数的两件事是导入行为和IDE的自动完成。 But importing and these special methods are different areas, and the one IDE I tried (the popular PyCharm) disagrees with you. 但是导入和这些特殊方法是不同的区域,我尝试的一个IDE(流行的PyCharm)不同意你的看法。 I created a class/object with methods _foo and __bar__ and then autocompletion didn't offer _foo but did offer __bar__ . 我用方法_foo__bar__创建了一个类/对象,然后自动完成没有提供_foo确实提供了__bar__ And when I used both methods anyway, PyCharm only warned me about _foo (calling it a "protected member"), not about __bar__ . 当我使用这两种方法时,PyCharm只警告我_foo (称之为“受保护的成员”), 而不是关于__bar__
  • PEP 8 says 'weak "internal use" indicator' explicitly for single leading underscore, and explicitly for double leading underscores it mentions the name mangling and later explains that it's for "attributes that you do not want subclasses to use" . PEP 8明确表示单个前导下划线的“弱”内部使用“指示符” ,并明确表示双引导下划线它提到名称错误,后来解释说它是“你不希望子类使用的属性” But the comment about double leading+trailing underscores doesn't say anything like that. 但关于双重领先+尾随下划线的评论并未说明这一点。
  • The data model page you yourself link to says that these special method names are "Python's approach to operator overloading" . 您自己链接的数据模型页面表示这些特殊方法名称“Python的运算符重载方法” Nothing about privacy there. 那里没有关于隐私的事情。 The words private/privacy/protected don't even appear anywhere on that page. private / privacy / protected这个词甚至不会出现在该页面的任何地方。

    I also recommend reading this article by Andrew Montalenti about these methods, emphasizing that "The dunder convention is a namespace reserved for the core Python team" and "Never, ever, invent your own dunders" because "The core Python team reserved a somewhat ugly namespace for themselves" . 我还建议阅读Andrew Montalenti关于这些方法的这篇文章 ,强调“dunder约定是为核心Python团队保留的命名空间”“永远不会发明你自己的dunders”,因为“核心Python团队保留了一些有点丑陋自己的命名空间“ Which all matches PEP 8's instruction "Never invent [dunder/magic] names; only use them as documented" . 这一切都与PEP 8的指令“永远不会发明[dunder / magic]名称相匹配;只能按照记录的方式使用它们” I think Andrew is spot on - it's just an ugly namespace of the core team. 我认为安德鲁是现实 - 它只是核心团队的一个丑陋的命名空间。 And it's for the purpose of operator overloading, not about privacy (not Andrew's point but mine and the data model page's). 它是出于操作员重载的目的,而不是关于隐私(不是Andrew的观点,而是我的和数据模型页面)。

Besides Andrew's article I also checked several more about these "magic"/"dunder" methods, and I found none of them talking about privacy at all. 除了安德鲁的文章,我还检查了几个关于这些“魔术”/“dunder”方法,我发现他们都没有谈论隐私。 That's just not what this is about. 这不是什么意思。

Again, we should use len(a) , not a.__len__() . 同样,我们应该使用len(a) ,而不是a.__len__() But not because of privacy. 但不是因为隐私。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM