[英]What is the relationship between the Python data model and built-in functions?
As I read Python answers on Stack Overflow, I continue to see some people telling users to use the data model's special methods or attributes directly. 当我在Stack Overflow上阅读Python答案时,我继续看到一些人告诉用户直接使用数据模型的特殊方法或属性 。
I then see contradicting advice (sometimes from myself) saying not to do that, and instead to use builtin functions and the operators directly. 然后我看到矛盾的建议(有时来自我自己)说不要这样做,而是直接使用内置函数和运算符。
Why is that? 这是为什么? What is the relationship between the special "dunder" methods and attributes of the Python data model and builtin functions ?
特殊的“dunder”方法与Python 数据模型和内置函数的属性之间有什么关系?
When am I supposed to use the special names? 我什么时候应该使用这个特殊的名字?
Thus, you should prefer to use the builtin functions and operators where possible over the special methods and attributes of the datamodel. 因此,您应该尽可能使用内置函数和运算符,而不是数据模型的特殊方法和属性。
The semantically internal APIs are more likely to change than the public interfaces. 语义内部API比公共接口更可能发生变化。 While Python doesn't actually consider anything "private" and exposes the internals, that doesn't mean it's a good idea to abuse that access.
虽然Python实际上并没有考虑任何“私有”并暴露内部,但这并不意味着滥用该访问权限是个好主意。 Doing so has the following risks:
这样做有以下风险:
The builtin functions and operators invoke the special methods and use the special attributes in the Python datamodel. 内置函数和运算符调用特殊方法并使用Python数据模型中的特殊属性。 They are the readable and maintainable veneer that hides the internals of objects.
它们是可读且可维护的单板,隐藏了对象的内部。 In general, users should use the builtins and operators given in the language as opposed to calling the special methods or using the special attributes directly.
通常,用户应使用语言中给出的内置和运算符,而不是直接调用特殊方法或使用特殊属性。
The builtin functions and operators also can have fallback or more elegant behavior than the more primitive datamodel special methods. 内置函数和运算符也可以具有后备或更优雅的行为,而不是更原始的数据模型特殊方法。 For example:
例如:
next(obj, default)
allows you to provide a default instead of raising StopIteration
when an iterator runs out, while obj.__next__()
does not. next(obj, default)
允许你提供一个默认值,而不是在迭代器用完时提高StopIteration
,而obj.__next__()
则没有。 str(obj)
fallsback to obj.__repr__()
when obj.__str__()
isn't available - whereas calling obj.__str__()
directly would raise an attribute error. obj.__str__()
不可用时str(obj)
回退到obj.__repr__()
- 而直接调用obj.__str__()
会引发属性错误。 obj != other
fallsback to not obj == other
in Python 3 when no __ne__
- calling obj.__ne__(other)
would not take advantage of this. obj != other
回退到not obj == other
在Python 3中没有__ne__
- 调用obj.__ne__(other)
不会利用这个。 (Builtin functions can also be easily overshadowed, if necessary or desirable, on a module's global scope or the builtins
module, to further customize behavior.) (如果必要或需要,在模块的全局范围或
builtins
模块上,内置函数也很容易被掩盖,以进一步自定义行为。)
Here is a mapping, with notes, of the builtin functions and operators to the respective special methods and attributes that they use or return - note that the usual rule is that the builtin function usually maps to a special method of the same name, but this is not consistent enough to warrant giving this map below: 下面是内置函数和运算符的映射,它们使用或返回各自的特殊方法和属性 - 请注意,通常的规则是内置函数通常映射到同名的特殊方法,但是这样不足以保证在下面给出这张地图:
builtins/ special methods/
operators -> datamodel NOTES (fb == fallback)
repr(obj) obj.__repr__() provides fb behavior for str
str(obj) obj.__str__() fb to __repr__ if no __str__
bytes(obj) obj.__bytes__() Python 3 only
unicode(obj) obj.__unicode__() Python 2 only
format(obj) obj.__format__() format spec optional.
hash(obj) obj.__hash__()
bool(obj) obj.__bool__() Python 3, fb to __len__
bool(obj) obj.__nonzero__() Python 2, fb to __len__
dir(obj) obj.__dir__()
vars(obj) obj.__dict__ does not include __slots__
type(obj) obj.__class__ type actually bypasses __class__ -
overriding __class__ will not affect type
help(obj) obj.__doc__ help uses more than just __doc__
len(obj) obj.__len__() provides fb behavior for bool
iter(obj) obj.__iter__() fb to __getitem__ w/ indexes from 0 on
next(obj) obj.__next__() Python 3
next(obj) obj.next() Python 2
reversed(obj) obj.__reversed__() fb to __len__ and __getitem__
other in obj obj.__contains__(other) fb to __iter__ then __getitem__
obj == other obj.__eq__(other)
obj != other obj.__ne__(other) fb to not obj.__eq__(other) in Python 3
obj < other obj.__lt__(other) get >, >=, <= with @functools.total_ordering
complex(obj) obj.__complex__()
int(obj) obj.__int__()
float(obj) obj.__float__()
round(obj) obj.__round__()
abs(obj) obj.__abs__()
The operator
module has length_hint
which has a fallback implemented by a respective special method if __len__
is not implemented: operator
模块具有length_hint
,如果未实现__len__
则具有通过相应特殊方法实现的回退:
length_hint(obj) obj.__length_hint__()
Dotted lookups are contextual. 虚线查找是上下文的。 Without special method implementation, first look in class hierarchy for data descriptors (like properties and slots), then in the instance
__dict__
(for instance variables), then in the class hierarchy for non-data descriptors (like methods). 在没有特殊方法实现的情况下,首先在类层次结构中查找数据描述符(如属性和槽),然后在实例
__dict__
(例如变量)中查找,然后在类层次结构中查找非数据描述符(如方法)。 Special methods implement the following behaviors: 特殊方法实现以下行为:
obj.attr obj.__getattr__('attr') provides fb if dotted lookup fails
obj.attr obj.__getattribute__('attr') preempts dotted lookup
obj.attr = _ obj.__setattr__('attr', _) preempts dotted lookup
del obj.attr obj.__delattr__('attr') preempts dotted lookup
Descriptors are a bit advanced - feel free to skip these entries and come back later - recall the descriptor instance is in the class hierarchy (like methods, slots, and properties). 描述符有点高级 - 随意跳过这些条目并稍后再回来 - 回想一下描述符实例在类层次结构中(如方法,插槽和属性)。 A data descriptor implements either
__set__
or __delete__
: 数据描述符实现
__set__
或__delete__
:
obj.attr descriptor.__get__(obj, type(obj))
obj.attr = val descriptor.__set__(obj, val)
del obj.attr descriptor.__delete__(obj)
When the class is instantiated (defined) the following descriptor method __set_name__
is called if any descriptor has it to inform the descriptor of its attribute name. 当实例化(定义)类时,如果任何描述符使它通知描述符其属性名称,则调用以下描述符方法
__set_name__
。 (This is new in Python 3.6.) cls
is same as type(obj)
above, and 'attr'
stands in for the attribute name: (这是Python 3.6中的新功能。)
cls
与上面的type(obj)
相同, 'attr'
代表属性名称:
class cls:
@descriptor_type
def attr(self): pass # -> descriptor.__set_name__(cls, 'attr')
The subscript notation is also contextual: 下标符号也是上下文的:
obj[name] -> obj.__getitem__(name)
obj[name] = item -> obj.__setitem__(name, item)
del obj[name] -> obj.__delitem__(name)
A special case for subclasses of dict
, __missing__
is called if __getitem__
doesn't find the key: 如果
__getitem__
找不到键,则调用dict
子类__missing__
:
obj[name] -> obj.__missing__(name)
There are also special methods for +, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, |
还有
+, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, |
特殊方法。 operators, for example: 运营商,例如:
obj + other -> obj.__add__(other), fallback to other.__radd__(obj)
obj | other -> obj.__or__(other), fallback to other.__ror__(obj)
and in-place operators for augmented assignment, +=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=
, for example: 和增强赋值的就地运算符,
+=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=
,例如:
obj += other -> obj.__iadd__(other)
obj |= other -> obj.__ior__(other)
and unary operations: 和一元行动:
+obj -> obj.__pos__()
-obj -> obj.__neg__()
~obj -> obj.__invert__()
A context manager defines __enter__
, which is called on entering the code block (its return value, usually self, is aliased with as
), and __exit__
, which is guaranteed to be called on leaving the code block, with exception information. 上下文管理器定义了
__enter__
,它在进入代码块时调用(其返回值,通常是self,用as
别名),而__exit__
,保证在离开代码块时被调用异常信息。
with obj as cm: -> cm = obj.__enter__()
raise Exception('message')
-> obj.__exit__(Exception, Exception('message'), traceback_object)
If __exit__
gets an exception and then returns a false value, it will reraise it on leaving the method. 如果
__exit__
获得异常然后返回false值,它将在离开方法时重新加载它。
If no exception, __exit__
gets None
for those three arguments instead, and the return value is meaningless: 如果没有异常,则
__exit__
会为这三个参数获取None
,并且返回值无意义:
with obj: -> obj.__enter__()
pass
-> obj.__exit__(None, None, None)
Similarly, classes can have special methods (from their metaclasses) that support abstract base classes: 类似地,类可以有支持抽象基类的特殊方法(来自它们的元类):
isinstance(obj, cls) -> cls.__instancecheck__(obj)
issubclass(sub, cls) -> cls.__subclasscheck__(sub)
An important takeaway is that while the builtins like next
and bool
do not change between Python 2 and 3, underlying implementation names are changing. 一个重要的问题是,虽然像
next
和bool
这样的内置函数在Python 2和3之间没有变化,但底层实现名称正在发生变化。
Thus using the builtins also offers more forward compatibility. 因此,使用内置也提供了更多的向前兼容性。
In Python, names that begin with underscores are semantically non-public names for users. 在Python中,以下划线开头的名称是用户的语义非公共名称。 The underscore is the creator's way of saying, "hands-off, don't touch."
下划线是创作者的说法,“放手,不要触摸”。
This is not just cultural, but it is also in Python's treatment of API's. 这不仅仅是文化,而且也是Python对API的处理。 When a package's
__init__.py
uses import *
to provide an API from a subpackage, if the subpackage does not provide an __all__
, it excludes names that start with underscores. 当包的
__init__.py
使用import *
从子包提供API时,如果子包没有提供__all__
,它将排除以下划线开头的名称。 The subpackage's __name__
would also be excluded. 子包的
__name__
也将被排除。
IDE autocompletion tools are mixed in their consideration of names that start with underscores to be non-public. IDE自动完成工具考虑到以下划线开头的非公开名称。 However, I greatly appreciate not seeing
__init__
, __new__
, __repr__
, __str__
, __eq__
, etc. (nor any of the user created non-public interfaces) when I type the name of an object and a period. 然而,我非常感谢没有看到
__init__
, __new__
, __repr__
, __str__
, __eq__
等(也没有任何创建的非公共接口的用户)当我输入一个对象和一个周期的名称。
Thus I assert: 因此我断言:
The special "dunder" methods are not a part of the public interface. 特殊的“dunder”方法不是公共接口的一部分。 Avoid using them directly.
避免直接使用它们。
So when to use them? 那么什么时候使用它们?
The main use-case is when implementing your own custom object or subclass of a builtin object. 主要用例是实现自己的自定义对象或内置对象的子类。
Try to only use them when absolutely necessary. 尽量只在绝对必要时使用它们。 Here are some examples:
这里有些例子:
__name__
special attribute on functions or classes __name__
特殊属性 When we decorate a function, we typically get a wrapper function in return that hides helpful information about the function. 当我们修饰一个函数时,我们通常得到一个包装函数作为回报,隐藏有关函数的有用信息。 We would use the
@wraps(fn)
decorator to make sure we don't lose that information, but if we need the name of the function, we need to use the __name__
attribute directly: 我们将使用
@wraps(fn)
装饰器来确保我们不会丢失该信息,但如果我们需要函数的名称,我们需要直接使用__name__
属性:
from functools import wraps
def decorate(fn):
@wraps(fn)
def decorated(*args, **kwargs):
print('calling fn,', fn.__name__) # exception to the rule
return fn(*args, **kwargs)
return decorated
Similarly, I do the following when I need the name of the object's class in a method (used in, for example, a __repr__
): 类似地,当我需要方法中的对象类的名称时(例如,在
__repr__
),我会执行以下操作:
def get_class_name(self):
return type(self).__name__
# ^ # ^- must use __name__, no builtin e.g. name()
# use type, not .__class__
When we want to define custom behavior, we must use the data-model names. 当我们想要定义自定义行为时,我们必须使用数据模型名称。
This makes sense, since we are the implementors, these attributes aren't private to us. 这是有道理的,因为我们是实现者,这些属性对我们来说不是私有的。
class Foo(object):
# required to here to implement == for instances:
def __eq__(self, other):
# but we still use == for the values:
return self.value == other.value
# required to here to implement != for instances:
def __ne__(self, other): # docs recommend for Python 2.
# use the higher level of abstraction here:
return not self == other
However, even in this case, we don't use self.value.__eq__(other.value)
or not self.__eq__(other)
(see my answer here for proof that the latter can lead to unexpected behavior.) Instead, we should use the higher level of abstraction. 但是,即使在这种情况下,我们也不使用
self.value.__eq__(other.value)
或not self.__eq__(other)
(请参阅我的答案,以证明后者可能导致意外行为。)相反,我们应该使用更高级别的抽象。
Another point at which we'd need to use the special method names is when we are in a child's implementation, and want to delegate to the parent. 我们需要使用特殊方法名称的另一点是当我们处于子实现时,并且想要委托给父代。 For example:
例如:
class NoisyFoo(Foo):
def __eq__(self, other):
print('checking for equality')
# required here to call the parent's method
return super(NoisyFoo, self).__eq__(other)
The special methods allow users to implement the interface for object internals. 特殊方法允许用户实现对象内部的接口。
Use the builtin functions and operators wherever you can. 尽可能使用内置函数和运算符。 Only use the special methods where there is no documented public API.
仅在没有文档公共API的情况下使用特殊方法。
I'll show some usage that you apparently didn't think of, comment on the examples you showed, and argue against the privacy claim from your own answer. 我将展示您显然没有想到的一些用法,评论您展示的示例,并根据您自己的答案反对隐私声明。
I agree with your own answer that for example len(a)
should be used, not a.__len__()
. 我同意你自己的答案,例如应该使用
len(a)
,而不是a.__len__()
。 I'd put it like this: len
exists so we can use it, and __len__
exists so len
can use it . 我这样说:
len
存在所以我们可以使用它,并且__len__
存在所以len
可以使用它 。 Or however that really works internally, since len(a)
can actually be much faster , at least for example for lists and strings: 或者这确实在内部工作,因为
len(a)
实际上可以更快 ,至少例如对于列表和字符串:
>>> timeit('len(a)', 'a = [1,2,3]', number=10**8)
4.22549770486512
>>> timeit('a.__len__()', 'a = [1,2,3]', number=10**8)
7.957335462257106
>>> timeit('len(s)', 's = "abc"', number=10**8)
4.1480574509332655
>>> timeit('s.__len__()', 's = "abc"', number=10**8)
8.01780160432645
But besides defining these methods in my own classes for usage by builtin functions and operators, I occasionally also use them as follows: 但是除了在我自己的类中定义这些方法以供内置函数和运算符使用之外,我偶尔也会使用它们如下:
Let's say I need to give a filter function to some function and I want to use a set s
as the filter. 假设我需要为某个函数提供过滤函数,并且我想使用集合
s
作为过滤器。 I'm not going to create an extra function lambda x: x in s
or def f(x): return x in s
. 我不会
lambda x: x in s
或def f(x): return x in s
创建一个额外的函数lambda x: x in s
def f(x): return x in s
。 No. I already have a perfectly fine function that I can use: the set's __contains__
method. 不,我已经拥有了一个可以使用的非常好的功能:set的
__contains__
方法。 It's simpler and more direct. 它更简单,更直接。 And even faster, as shown here (ignore that I save it as
f
here, that's just for this timing demo): 甚至更快,如此处所示(忽略我将其保存为
f
,这仅适用于此计时演示):
>>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = s.__contains__', number=10**8)
6.473739433621368
>>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = lambda x: x in s', number=10**8)
19.940786514456924
>>> timeit('f(2); f(4)', 's = {1, 2, 3}\ndef f(x): return x in s', number=10**8)
20.445680107760325
So while I don't directly call magic methods like s.__contains__(x)
, I do occasionally pass them somewhere like some_function_needing_a_filter(s.__contains__)
. 因此,虽然我没有直接调用
s.__contains__(x)
类s.__contains__(x)
魔术方法,但我偶尔会将它们传递给some_function_needing_a_filter(s.__contains__)
。 And I think that's perfectly fine, and better than the lambda/def alternative. 我认为这完全没问题,并且比lambda / def替代方案更好。
My thoughts on the examples you showed: 我对你展示的例子的想法:
items.__len__()
. items.__len__()
。 Even without any reasoning. len(items)
. len(items)
。 d[key] = value
first! d[key] = value
! And then adds d.__setitem__(key, value)
with the reasoning "if your keyboard is missing the square bracket keys" , which rarely applies and which I doubt was serious. d.__setitem__(key, value)
和推理“如果你的键盘缺少方括号键” ,这很少适用,我怀疑是严重的。 I think it was just the foot in the door for the last point, mentioning that that's how we can support the square bracket syntax in our own classes. obj.__dict__
. obj.__dict__
。 Bad, like the __len__
example. __len__
例子一样。 But I suspect he just didn't know vars(obj)
, and I can understand it, as vars
is less common/known and the name does differ from the "dict" in __dict__
. vars(obj)
,我可以理解它,因为vars
不太常见/已知并且名称确实与__dict__
的“字典”不同。 __class__
. __class__
。 Should be type(obj)
. type(obj)
。 I suspect it's similar to the __dict__
story, although I think type
is more well-known. __dict__
故事类似,尽管我认为type
更为人所知。 About privacy: In your own answer you say these methods are "semantically private". 关于隐私:在您自己的回答中,您说这些方法是“语义上的私密”。 I strongly disagree.
我非常不同意。 Single and double leading underscores are for that, but not the data model's special "dunder/magic" methods with double leading+trailing underscores.
单,双下划线开头是这一点,但没有数据模型的特殊的“dunder /神奇”的方法与双带+尾随下划线。
_foo
and __bar__
and then autocompletion didn't offer _foo
but did offer __bar__
. _foo
和__bar__
创建了一个类/对象,然后自动完成没有提供_foo
但确实提供了__bar__
。 And when I used both methods anyway, PyCharm only warned me about _foo
(calling it a "protected member"), not about __bar__
. _foo
(称之为“受保护的成员”), 而不是关于__bar__
。 Besides Andrew's article I also checked several more about these "magic"/"dunder" methods, and I found none of them talking about privacy at all. 除了安德鲁的文章,我还检查了几个关于这些“魔术”/“dunder”方法,我发现他们都没有谈论隐私。 That's just not what this is about.
这不是什么意思。
Again, we should use len(a)
, not a.__len__()
. 同样,我们应该使用
len(a)
,而不是a.__len__()
。 But not because of privacy. 但不是因为隐私。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.