简体   繁体   English

python中增广运算符(定界符)的求值顺序

[英]Evaluation order of augmented operators (delimiters) in python

If I evaluate the following minimal example in python如果我在 python 中评估以下最小示例

a = [1, 2, 3]
a[-1] += a.pop()

I get我明白了

[1, 6]

So it seems that this is evaluated as所以似乎这被评估为

a[-1] = a[-1] + a.pop()

where each expression/operand would be evaluated in the order其中每个表达式/操作数将按顺序计算

third = first + second

so that on the lefthand side a[-1] is the 2nd element while on the righthand side it is the 3rd.所以在左侧 a[-1] 是第二个元素,而在右侧是第三个。

a[1] = a[2] + a.pop()

Can someone explain to me how one could infer this from the docs ?有人可以向我解释如何从文档中推断出这一点吗? Apparently '+=' is lexically a delimiter that also performs an operation (see here ).显然 '+=' 在词法上是一个也执行操作的定界符(参见此处)。 What does that imply for its evaluation order?这对其评估顺序意味着什么?

EDIT:编辑:

I tried to clarify my question in a comment.我试图在评论中澄清我的问题。 I'll include it here for reference.我将其包含在此处以供参考。

I want to understand if augmented operators have to be treated in a special way (ie by expanding them) during lexical analysis, because you kind of have to duplicate an expression and evaluate it twice.我想了解在词法分析期间是否必须以特殊方式(即通过扩展它们)处理增强运算符,因为您必须复制一个表达式并对其进行两次评估。 This is not clear in the docs and I want to know where this behaviour is specified.这在文档中并不清楚,我想知道在哪里指定了这种行为。 Other lexical delimiters (eg '}') behave differently.其他词法定界符(例如'}')表现不同。

Let me start with the question you asked at the end:让我从你最后问的问题开始:

I want to understand if augmented operators have to be treated in a special way (ie by expanding them) during lexical analysis,我想了解在词法分析期间是否必须以特殊方式(即通过扩展它们)处理增强运算符,

That one is simple;那个很简单; the answer is "no".答案是不”。 A token is just a token and the lexical analyser just divides the input into tokens.标记只是一个标记,词法分析器只是将输入划分为标记。 As far as the lexical analyser is concerned, += is just a token, and that's what it returns for it.就词法分析器而言, +=只是一个标记,这就是它为它返回的内容。

By the way, the Python docs make a distinction between "operators" and "punctuation", but that's not really a significant difference for the current lexical analyser.顺便说一句,Python 文档区分了“运算符”和“标点符号”,但这对于当前的词法分析器来说并不是真正的显着差异。 It might have made sense in some previous incarnation of the parser based on operator-precedence parsing, in which an "operator" is a lexeme with associated precedence and associativity.在基于运算符优先级解析的解析器的某些先前化身中,这可能是有意义的,其中“运算符”是具有相关优先级和关联性的词位。 But I don't know if Python ever used that particular parsing algorithm;但我不知道 Python 是否曾经使用过那个特定的解析算法; in the current parser, both "operators" and "punctuation" are literal lexemes which appear as such in syntax rules.在当前的解析器中,“运算符”和“标点符号”都是在语法规则中出现的字面词位。 As you might expect, the lexical analyser is more concerned with the length of the tokens ( <= and += are both two-character tokens) than with the eventual use inside the parser.如您所料,词法分析器更关心标记的长度( <=+=都是两个字符的标记),而不是解析器中的最终使用。

"Desugaring" -- the technical term for source tranformations which convert some language construct into a simpler construct -- is not usually performed either in the lexer or in the parser, although the internal workings of compilers are not subject to a Code of Conduct. “脱糖”——将某种语言结构转换为更简单结构的源转换的技术术语——通常不在词法分析器或解析器中执行,尽管编译器的内部工作不受行为准则的约束。 Whether a language even has a desugaring component is generally considered an implementation detail, and may not be particularly visible;一种语言是否有脱糖组件通常被认为是一个实现细节,并且可能不是特别明显; that's certainly true of Python. Python 确实如此。 Python doesn't expose an interface to its tokeniser, either; Python 也不向其标记器公开接口。 the tokenizer module is a reimplementation in pure Python which does not produce exactly the same behaviour (although it's close enough to be a useful exploratory tool). tokenizer模块是纯 Python 中的重新实现,它不会产生完全相同的行为(尽管它足够接近成为有用的探索工具)。 But the parser is exposed in the ast module, which provides direct access to Python's own parser (at least in the CPython implementation), and that let's us see that no desugaring is done up to the point that the AST is constructed (note: requires Python3.9 for the indent option):但是解析器暴露在ast模块中,它提供了对 Python 自己的解析器的直接访问(至少在 CPython 实现中),并且让我们看到在构造 AST 之前没有进行脱糖(注意:需要Python3.9 用于indent选项):

>>> import ast
>>> def showast(code):
...    print(ast.dump(ast.parse(code), indent=2))
...
>>> showast('a[-1] += a.pop()')
Module(
  body=[
    AugAssign(
      target=Subscript(
        value=Name(id='a', ctx=Load()),
        slice=UnaryOp(
          op=USub(),
          operand=Constant(value=1)),
        ctx=Store()),
      op=Add(),
      value=Call(
        func=Attribute(
          value=Name(id='a', ctx=Load()),
          attr='pop',
          ctx=Load()),
        args=[],
        keywords=[]))],
  type_ignores=[])

This produces exactly the syntax tree you would expect from the grammar, in which "augmented assignment" statements are represented as a specific production within assignment :这会产生您期望从语法中得到的语法树,其中“增强赋值”语句表示为assignment中的特定产生式:

assignment:
    | single_target augassign ~ (yield_expr | star_expressions)

single_target is a single assignable expression (such as a variable or, as in this case, a subscripted array); single_target是一个单一的可赋值表达式(例如一个变量,或者,在这种情况下,一个下标数组); augassign is one of the augmented assignment operators, and the rest are alternatives for the right-hand side of the assignment. augassign是扩充赋值运算符之一,其余的是赋值右侧的替代操作。 ( You can ignore the "fence" grammar operator ~ .) The parse tree produced by ast.dump is pretty close to the grammar, and shows no desugaring at all: 您可以忽略“栅栏”语法运算符~ 。) ast.dump生成的解析树非常接近语法,并且根本没有脱糖:

                      
         --------------------------
         |         |              |
     Subscript    Add            Call
     ---------           -----------------
      |     |            |        |      |
      a    -1        Attribute   [ ]    [ ]
                      ---------
                       |     |
                       a   'pop'

The magic happens afterwards, which we can also see because the Python standard library also includes a disassembler:奇迹发生在之后,我们也可以看到,因为 Python 标准库还包含一个反汇编程序:

>>> import dis
>>> dis.dis(compile('a[-1] += a.pop()', '--', 'exec'))
  1           0 LOAD_NAME                0 (a)
              2 LOAD_CONST               0 (-1)
              4 DUP_TOP_TWO
              6 BINARY_SUBSCR
              8 LOAD_NAME                0 (a)
             10 LOAD_METHOD              1 (pop)
             12 CALL_METHOD              0
             14 INPLACE_ADD
             16 ROT_THREE
             18 STORE_SUBSCR
             20 LOAD_CONST               1 (None)
             22 RETURN_VALUE

As can be seen, trying to summarize the evaluation order of augmented assignment as "left-to-right" is just an approximation.可以看出,试图将增强赋值的评估顺序总结为“从左到右”只是一个近似值。 Here's what actually happens, as revealed in the virtual machine code above:如上面的虚拟机代码所示,这是实际发生的情况:

  1. The target aggregate and its index are "computed" (lines 0 and 2), and then these two values are duplicated (line 4). “计算”目标聚合及其索引(第 0 行和第 2 行),然后复制这两个值(第 4 行)。 (The duplication means that neither the target nor its subscript are evaluated twice.) (重复意味着目标及其下标都不会被计算两次。)

  2. Then the duplicated values are used to lookup the value of the element (line 6).然后使用重复的值来查找元素的值(第 6 行)。 So it's at this point that the value of a[-1] is evaluated.因此,此时评估a[-1]的值。

  3. The right-hand side expression ( a.pop() ) is then evaluated (lines 8 through 12).然后计算右侧表达式( a.pop() )(第 8 行到第 12 行)。

  4. These two values (both 3, in this case) are combined with INPLACE_ADD because this is an ADD augmented assignment.这两个值(在这种情况下都是 3)与INPLACE_ADD组合,因为这是一个ADD增强分配。 In the case of integers, there's no difference between INPLACE_ADD and ADD , because integers are immutable values.在整数的情况下, INPLACE_ADDADD之间没有区别,因为整数是不可变的值。 But the compiler doesn't know that the first operand is an integer.但是编译器不知道第一个操作数是整数。 a[-1] could be anything, including another list. a[-1]可以是任何东西,包括另一个列表。 So it emits an operand which will trigger the use of the __iadd__ method instead of __add__ , in case there is a difference.因此,它会发出一个操作数,该操作数将触发使用__iadd__方法而不是__add__ ,以防有区别。

  5. The original target and subscript, which have been patiently waiting on the stack since step 1, are then used to perform a subscripted store (lines 16 and 18. The subscript is still the subscript computed at line 2, -1 . But at this point a[-1] refers to a different element of a . The rotate is needed to get the arguments for into the correct order. Because the normal order of evaluation for assignment is to evaluate the right-hand side first, the virtual machine assumes that the new value will be at the bottom of the stack, followed by the object and its subscript.原始目标和下标,从第 1 步开始就一直在堆栈上耐心等待,然后用于执行下标存储(第 16 和 18 行。下标仍然是在第 2 行-1处计算的下标。但此时a[-1]指的是a的不同元素。需要旋转才能将参数 for 转换为正确的顺序。因为赋值的正常评估顺序是首先评估右侧,虚拟机假设新值将位于堆栈的底部,然后是对象及其下标。

  6. Finally, None is returned as the value of the statement.最后, None作为语句的值返回。

The precise workings of assignment and augmented assignment statements are documented in the Python reference manual. Python 参考手册中记录了赋值语句和增强赋值语句的精确工作原理。 Another important source of information is the description of the __iadd__ special method .另一个重要的信息来源是__iadd__特殊方法的描述 Evaluation (and evaluation order) for augmented assignment operations is sufficiently confusing that there is a Programming FAQ dedicated to it, which is worth reading carefully if you want to understand the exact mechanism.增强赋值操作的评估(和评估顺序)非常令人困惑,以至于有一个专门针对它的编程常见问题解答,如果您想了解确切的机制,值得仔细阅读。

Interesting though that information is, it's worth adding that writing programs which depend on details of the evaluation order inside an augmented assignment is not conducive to producing readable code.尽管这些信息很有趣,但值得补充的是,编写依赖于增强赋值中评估顺序细节的程序不利于生成可读代码。 In almost all cases, augmented assignment which relies on non-obvious details of the procedure should be avoided, including statements such as the one that is the target of this question.在几乎所有情况下,都应避免依赖于过程的非显而易见细节的增强分配,包括诸如作为该问题目标的陈述。

rici did a great job showing what's happening under the hood in the CPython reference interpreter , but there's a much simpler "source of truth" here in the language spec, which guarantees this behavior for any Python interpreter (not just CPython, but PyPy, Jython, IronPython, Cython, etc.). rici 很好地展示了 CPython 参考解释器的幕后情况,但语言规范中有一个更简单的“事实来源”,它保证任何Python 解释器(不仅仅是 CPython,还有 PyPy、Jython 、IronPython、Cython 等)。 In the language spec, under Chapter 6: Expressions, section 6.16, Evaluation Order , it specifies:在语言规范中,在第 6 章:表达式,第 6.16 节,评估顺序下,它指定:

Python evaluates expressions from left to right. Python 从左到右计算表达式。 Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side.请注意,在评估分配时,右侧先于左侧评估。

That second sentence sounds like an exception to the general rule, but it isn't;第二句话听起来像是一般规则的例外,但事实并非如此。 assignment with = (including augmented assignment with += or the like) is not an expression in Python ( the walrus operator introduced in 3.8 is an expression, but it can only assign to bare names, so there is never anything to "evaluate" on the left side, it's purely storing there, never reading from it), it's a statement, and the assignment statement has its own rules for order of evaluation .=的赋值(包括带+=之类的扩展赋值)不是 Python 中的表达式( 3.8 中引入的 walrus 运算符一个表达式,但它只能分配给裸名,因此永远没有任何东西可以“评估”左侧,它纯粹存储在那里,从不读取),它是一个语句,并且赋值语句有自己的评估顺序规则 Those rules for assignment specify:这些分配规则指定:

An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right.赋值语句计算表达式列表(请记住,这可以是单个表达式或逗号分隔的列表,后者产生一个元组)并将单个结果对象从左到右分配给每个目标列表。

This confirms the second sentence from the Expression Evaluation Order documentation;这证实了表达评估令文件中的第二句话; the expression list (the thing to be assigned) is evaluated first, then assignments to the targets proceed from there.首先评估表达式列表(要分配的事物),然后从那里开始对目标的分配。 So by the language spec itself, a[-1] += a.pop() must completely evaluate a.pop() first (the "expression list"), then perform assignment.因此,根据语言规范本身, a[-1] += a.pop()必须首先完全评估a.pop() (“表达式列表”),然后执行赋值。

This behavior is required by the language spec, and has been for some time, so it can be relied on no matter what Python interpreter you're using.这种行为是语言规范所要求的,并且已经有一段时间了,因此无论您使用什么 Python 解释器都可以依赖它。

That said, I'd recommend against code that relies on these guarantees from Python .也就是说,我建议不要使用依赖于 Python 的这些保证的代码 For one, when you switch to other languages, the rules differ (and in some cases, eg many similar cases in C and C++, varying by version of the standard, there are no "rules", and trying to mutate the same object in multiple parts of an expression produces undefined behavior), so growing to rely on Python's behavior will hamper your ability to use other languages.一方面,当您切换到其他语言时,规则会有所不同(在某些情况下,例如 C 和 C++ 中的许多类似情况,因标准版本而异,没有“规则”,并试图在表达式的多个部分会产生未定义的行为),因此越来越依赖 Python 的行为会妨碍您使用其他语言的能力。 Beyond that, it's still going to be confusing as hell, and just slight changes will avoid the confusion, for example, in your case, changing:除此之外,它仍然会令人困惑,只需稍作更改即可避免混淆,例如,在您的情况下,更改:

a[-1] += a.pop()

to just:只是:

x = a.pop()
a[-1] += x

which, while admittedly a two-liner and therefore inferior!!!其中,虽然不可否认是两条线,因此劣势!!! , achieves the same result, with meaningless overhead, and greater clarity. , 实现了相同的结果,但开销没有意义,而且更清晰。

TL;DR: The Python language spec guarantees that the right-hand side of += is fully evaluated before the augmented assignment operation begins and any of the code on the left-hand side is evaluated. TL;DR:Python 语言规范保证+=的右侧在扩展赋值操作开始之前被完全评估,并且左侧的任何代码都被评估。 But for code clarity, any code that relies on that guarantee should probably be refactored to avoid said reliance.但是为了代码清晰,任何依赖于该保证的代码都应该被重构以避免上述依赖。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM