简体   繁体   English

Python解析器/编译器与解释器,以及字符串串联编译时与运行时?

[英]Python parser/compiler vs. interpreter, and string concatenation compile-time vs. run-time?

At this spot in this articl e by one of the major Python people, the author notes that automatic string concatenation is a feature of the parser/compiler as opposed to the interpreter, which is why you must use + to concatenate strings at runtime. 在一位主要的Python专家的这篇文章中 ,作者注意到自动字符串串联是解析器/编译器的功能,而不是解释器,这就是为什么必须在运行时使用+来串联字符串的原因。

I don't understand anything about that. 我对此一无所知。 I know you can concatenate with + and I know two string literals side by side are auto-concatenated and I know you of course can't do that with variables containing strings but I have no idea what the difference is between a parser/compiler and an interpreter (for python, or in general) and I have no idea how it ties in to this whole string concatenation thing. 我知道您可以用+串联,并且我知道两个字符串文字是自动串联的,并且我当然知道您不能对包含字符串的变量执行此操作,但是我不知道解析器/编译器和一个解释器(适用于python或一般而言),我不知道它如何与整个字符串连接有关。

Explanation??? 说明???

Python is an interpreted language (as opposed to languages like C++ that are compiled to machine code before execution). Python是一种解释性语言(与诸如C ++之类的在执行之前被编译为机器代码的语言相反)。

Now there is an intermediate step: The source (text) files are compiled to bytecode, and that bytecode is then run by the Python interpreter. 现在有一个中间步骤:源(文本)文件被编译为字节码,然后该字节码由Python解释器运行。

Verbatim string concatenation (as in "a" "b" becoming "ab" ) is already done by the bytecode compiler. 逐字字符串连接(如"a" "b"变为"ab" )已由字节码编译器完成。 The same goes for "a" + "b" because the compiler can already figure out the literal values: "a" + "b"因为编译器已经可以计算出文字值了:

>>> import dis
>>> def s(): print "a" "b"
...
>>> dis.dis(s)
  1           0 LOAD_CONST               1 ('ab')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE
>>> def s(): print "ab"
...
>>> dis.dis(s)
  1           0 LOAD_CONST               1 ('ab')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE
>>> def s(): print "a"+"b"
...
>>> dis.dis(s)
  1           0 LOAD_CONST               3 ('ab')
              3 PRINT_ITEM
              4 PRINT_NEWLINE
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE

But for values that can't trivially be inferred at compile time, it's the interpreter's job to do the concatenation: 但是对于无法在编译时轻松推断的值,进行串联是解释器的工作:

>>> def s(): print "a" + chr(98)
...
>>> dis.dis(s)
  1           0 LOAD_CONST               1 ('a')
              3 LOAD_GLOBAL              0 (chr)
              6 LOAD_CONST               2 (98)
              9 CALL_FUNCTION            1
             12 BINARY_ADD
             13 PRINT_ITEM
             14 PRINT_NEWLINE
             15 LOAD_CONST               0 (None)
             18 RETURN_VALUE
>>> s()
ab

When Python code is being translated into byte-code side-by-side strings are being merged. 将Python代码转换为字节代码时,将合并并排字符串。 This is done only once - every time you'll run the script without deleting the precompiled pyc the concatenation result will be there. 这仅执行一次-每次运行脚本而不删除预编译的pyc ,连接结果都将存在。 Even without the precompiled file, the concatenation result will be placed in the byte-code, so still each time this code (eg a function) is being run there is no need to calculate the result of concatenation. 即使没有预编译的文件,连接结果也将放置在字节码中,因此仍在每次运行此代码(例如函数)时,都无需计算连接结果。

If you use + on the other hand, the byte-code will contain both strings, and the expression will be evaluated every time this code is being run. 另一方面,如果您使用+ ,则字节码将包含两个字符串,并且每次运行该代码时都会对表达式进行求值。 EDIT : not always as noted by Tim Pietzcker in his answer - however in such case it's a matter of compiler's optimization, not behaviour guaranteed to always happen by language semantics. 编辑 :并非总是如蒂姆·皮茨克(Tim Pietzcker)在他的回答中所指出的-但是,在这种情况下,这是编译器的优化问题,并非由语言语义保证的行为总是会发生。

Note that because syntax is part of the language definition, the differentiation between compiler and interpreter is irrelevant here. 请注意,由于语法是语言定义的一部分,因此此处的编译器和解释器之间的区别无关紧要。

Reference: lexical analysis in Python 参考: Python中的词法分析

A compiled language (EG: C, C++) translates human-readable source code into machine-readable machine code. 编译语言(例如:C,C ++)将人类可读的源代码转换成机器可读的机器代码。

An interpreted language (EG: old microsoft BASIC on 6502's) recomputes what a step needs to do, each time that step is executed. 每次执行该步骤时,一种解释性语言(例如,EG:6502上的旧microsoft BASIC)都会重新计算该步骤需要执行的操作。

A middle ground exists. 存在中间立场。 Languages like Python and Java compile, but they don't compile to machine code; 诸如Python和Java之类的语言可以编译,但是它们不能编译为机器代码。 instead they compile to an idealised, software-only machine's byte code. 相反,它们编译为理想的纯软件机器的字节码。 This gives great portability, and decent speed, especially if combined with a JIT (Java, Pypy, CPython 2.[56] with psyco all JIT compile byte code). 这提供了极大的可移植性和不错的速度,特别是如果与JIT结合使用(Java,Pypy,CPython 2 [56]与psyco一起使用所有JIT编译字节码)。

Confusingly, Java people often say their language is compiled and that Python is not compiled, and there was some discussion a while back of implementing a Java Runtime Environment in hardware, though I'm not sure it ever materialized. 令人困惑的是,Java人们经常说他们的语言是经过编译的,而Python没有经过编译,因此在硬件中实现Java运行时环境还有些讨论,尽管我不确定它是否曾经实现过。

Also, gcj compiles Java source code to machine readable executables, as does Cython - among others. 同样,gcj和Cython一样,将Java源代码编译为机器可读的可执行文件。 But Java and Python are both mostly byte-code interpreted. 但是Java和Python大多都是字节码解释的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM