[英]global vs. local namespace performance difference
Why is it that executing a set of commands in a function: 为什么在函数中执行一组命令:
def main():
[do stuff]
return something
print(main())
will tend to run 1.5x
to 3x
times faster in python than executing commands in the top level: 在python中运行速度比在顶级执行命令要快
1.5x
到3x
倍:
[do stuff]
print(something)
The difference does indeed greatly depend on what "do stuff" actually does and mainly on how many times it accesses names that are defined/used. 差异确实很大程度上取决于“做事”实际上做了什么, 主要取决于它访问定义/使用的名称的次数。 Granted that the code is similar, there is a fundamental difference between these two cases:
假设代码类似,这两种情况之间存在根本区别:
LOAD_FAST
/ STORE_FAST
. LOAD_FAST
/ STORE_FAST
完成加载/存储名称的字节代码。 LOAD_NAME
/ STORE_NAME
which are more sluggish. LOAD_NAME
/ STORE_NAME
执行相同的命令,这些命令更加缓慢。 This can be viewed in the following cases, I'll be using a for
loop to make sure that lookups for variables defined is performed multiple times . 这可以在以下情况下查看, 我将使用
for
循环来确保定义的变量的查找多次执行 。
Function and LOAD_FAST/STORE_FAST
: 功能和
LOAD_FAST/STORE_FAST
:
We define a simple function that does some really silly things: 我们定义了一个简单的函数来做一些非常愚蠢的事情:
def main():
b = 20
for i in range(1000000): z = 10 * b
return z
Output generated by dis.dis
: dis.dis
生成的输出:
dis.dis(main)
# [/snipped output/]
18 GET_ITER
>> 19 FOR_ITER 16 (to 38)
22 STORE_FAST 1 (i)
25 LOAD_CONST 3 (10)
28 LOAD_FAST 0 (b)
31 BINARY_MULTIPLY
32 STORE_FAST 2 (z)
35 JUMP_ABSOLUTE 19
>> 38 POP_BLOCK
# [/snipped output/]
The thing to note here is the LOAD_FAST/STORE_FAST
commands at the offsets 28
and 32
, these are used to access the b
name used in the BINARY_MULTIPLY
operation and store the z
name, respectively. 这里要注意的是偏移
28
和32
处的LOAD_FAST/STORE_FAST
命令,这些命令用于访问BINARY_MULTIPLY
操作中使用的b
名称并分别存储z
名称。 As their byte code name implies, they are the fast version of the LOAD_*/STORE_*
family. 正如它们的字节代码名称所暗示的那样, 它们是
LOAD_*/STORE_*
系列的快速版本 。
Modules and LOAD_NAME/STORE_NAME
: 模块和
LOAD_NAME/STORE_NAME
:
Now, let's look at the output of dis
for our module version of the previous function: 现在,让我们看看上一个函数的模块版本的
dis
输出:
# compile the module
m = compile(open('main.py', 'r').read(), "main", "exec")
dis.dis(m)
# [/snipped output/]
18 GET_ITER
>> 19 FOR_ITER 16 (to 38)
22 STORE_NAME 2 (i)
25 LOAD_NAME 3 (z)
28 LOAD_NAME 0 (b)
31 BINARY_MULTIPLY
32 STORE_NAME 3 (z)
35 JUMP_ABSOLUTE 19
>> 38 POP_BLOCK
# [/snipped output/]
Over here we have multiple calls to LOAD_NAME/STORE_NAME
, which , as mentioned previously, are more sluggish commands to execute . 在这里,我们有多个
LOAD_NAME/STORE_NAME
调用, LOAD_NAME/STORE_NAME
, 这些调用执行起来比较迟钝 。
In this case, there is going to be a clear difference in execution time , mainly because Python must evaluate LOAD_NAME/STORE_NAME
and LOAD_FAST/STORE_FAST
multiple times (due to the for
loop I added) and, as a result, the overhead introduced each time the code for each byte code is executed will accumulate . 在这种情况下, 执行时间会有明显的差异 ,主要是因为Python必须多次评估
LOAD_NAME/STORE_NAME
和LOAD_FAST/STORE_FAST
(由于我添加了for
循环),因此每次都会引入开销执行每个字节代码的代码将累积 。
Timing the execution 'as a module': 将执行“定位为模块”:
start_time = time.time()
b = 20
for i in range(1000000): z = 10 *b
print(z)
print("Time: ", time.time() - start_time)
200
Time: 0.15162253379821777
Timing the execution as a function: 将执行时间定位为函数:
start_time = time.time()
print(main())
print("Time: ", time.time() - start_time)
200
Time: 0.08665871620178223
If you time
loops in a smaller range
(for example for i in range(1000)
) you'll notice that the 'module' version is faster. 如果你
time
在一个较小的循环range
(例如for i in range(1000)
你会发现,“模块”的版本速度更快。 This happens because the overhead introduced by needing to call function main()
is larger than that introduced by *_FAST
vs *_NAME
differences. 这是因为需要调用函数
main()
引入的开销大于*_FAST
vs *_NAME
*_FAST
差异引入的*_FAST
。 So it's largely relative to the amount of work that is done. 所以它在很大程度上取决于完成的工作量。
So, the real culprit here, and the reason why this difference is evident, is the for
loop used. 所以,这里真正的罪魁祸首,以及这种差异显而易见的原因是使用了
for
循环。 You generally have 0
reason to ever put an intensive loop like that one at the top level of your script. 你通常有
0
理由在脚本的顶层放置一个密集的循环。 Move it in a function and avoid using global variables , it is designed to be more efficient. 在函数中移动它并避免使用全局变量 ,它被设计为更高效。
You can take a look at the code executed for each of the byte code. 您可以查看为每个字节代码执行的代码。 I'll link the source for the
3.5
version of Python here even though I'm pretty sure 2.7
doesn't differ much. 我会在这里链接
3.5
版Python的源代码,尽管我很确定2.7
没有太大差别。 Bytecode evaluation is done in Python/ceval.c
specifically in function PyEval_EvalFrameEx
: 字节码评估在
Python/ceval.c
完成,特别是在函数PyEval_EvalFrameEx
:
LOAD_FAST source
- STORE_FAST source
LOAD_FAST source
- STORE_FAST source
LOAD_NAME source
- STORE_NAME source
LOAD_NAME source
- STORE_NAME source
As you'll see, the *_FAST
bytecodes simply get the value stored/loaded using a fastlocals
local symbol table contained inside frame objects . 正如您将看到的,
*_FAST
字节码只是使用框架对象中包含的fastlocals
本地符号表来获取存储/加载的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.