全局与本地命名空间性能差异

Question

Why is it that executing a set of commands in a function: 为什么在函数中执行一组命令：

def main():
    [do stuff]
    return something
print(main())

will tend to run 1.5x to 3x times faster in python than executing commands in the top level: 在python中运行速度比在顶级执行命令要快1.5x到3x倍：

[do stuff]
print(something)

Answer 1

The difference does indeed greatly depend on what "do stuff" actually does and mainly on how many times it accesses names that are defined/used. 差异确实很大程度上取决于“做事”实际上做了什么，主要取决于它访问定义/使用的名称的次数。 Granted that the code is similar, there is a fundamental difference between these two cases: 假设代码类似，这两种情况之间存在根本区别：

In functions, the byte code for loading/storing names is done with LOAD_FAST / STORE_FAST . 在函数中，使用LOAD_FAST / STORE_FAST完成加载/存储名称的字节代码。
In the top level scope (ie module), the same commands are performed with LOAD_NAME / STORE_NAME which are more sluggish. 在顶级范围（即模块）中，使用LOAD_NAME / STORE_NAME执行相同的命令，这些命令更加缓慢。

This can be viewed in the following cases, I'll be using a for loop to make sure that lookups for variables defined is performed multiple times . 这可以在以下情况下查看， 我将使用for循环来确保定义的变量的查找多次执行 。

Function and LOAD_FAST/STORE_FAST : 功能和LOAD_FAST/STORE_FAST ：

We define a simple function that does some really silly things: 我们定义了一个简单的函数来做一些非常愚蠢的事情：

def main():
    b = 20
    for i in range(1000000): z = 10 * b 
    return z

Output generated by dis.dis : dis.dis生成的输出：

dis.dis(main)
# [/snipped output/]

             18 GET_ITER
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_FAST               1 (i)
             25 LOAD_CONST               3 (10)
             28 LOAD_FAST                0 (b)
             31 BINARY_MULTIPLY
             32 STORE_FAST               2 (z)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK

# [/snipped output/]

The thing to note here is the LOAD_FAST/STORE_FAST commands at the offsets 28 and 32 , these are used to access the b name used in the BINARY_MULTIPLY operation and store the z name, respectively. 这里要注意的是偏移28和32处的LOAD_FAST/STORE_FAST命令，这些命令用于访问BINARY_MULTIPLY操作中使用的b名称并分别存储z名称。 As their byte code name implies, they are the fast version of the LOAD_*/STORE_* family. 正如它们的字节代码名称所暗示的那样， 它们是 LOAD_*/STORE_*系列的快速版本 。

Modules and LOAD_NAME/STORE_NAME : 模块和LOAD_NAME/STORE_NAME ：

Now, let's look at the output of dis for our module version of the previous function: 现在，让我们看看上一个函数的模块版本的dis输出：

# compile the module
m = compile(open('main.py', 'r').read(), "main", "exec")

dis.dis(m)
# [/snipped output/]

             18 GET_ITER
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_NAME               2 (i)
             25 LOAD_NAME                3 (z)
             28 LOAD_NAME                0 (b)
             31 BINARY_MULTIPLY
             32 STORE_NAME               3 (z)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK

# [/snipped output/]

Over here we have multiple calls to LOAD_NAME/STORE_NAME , which , as mentioned previously, are more sluggish commands to execute . 在这里，我们有多个LOAD_NAME/STORE_NAME调用， LOAD_NAME/STORE_NAME ，这些调用执行起来比较迟钝 。

In this case, there is going to be a clear difference in execution time , mainly because Python must evaluate LOAD_NAME/STORE_NAME and LOAD_FAST/STORE_FAST multiple times (due to the for loop I added) and, as a result, the overhead introduced each time the code for each byte code is executed will accumulate . 在这种情况下， 执行时间会有明显的差异 ，主要是因为Python必须多次评估LOAD_NAME/STORE_NAME和LOAD_FAST/STORE_FAST （由于我添加了for循环），因此每次都会引入开销执行每个字节代码的代码将累积 。

Timing the execution 'as a module': 将执行“定位为模块”：

start_time = time.time()
b = 20 
for i in range(1000000): z = 10 *b
print(z)
print("Time: ", time.time() - start_time)
200
Time:  0.15162253379821777

Timing the execution as a function: 将执行时间定位为函数：

start_time = time.time()
print(main())
print("Time: ", time.time() - start_time)
200
Time:  0.08665871620178223

If you time loops in a smaller range (for example for i in range(1000) ) you'll notice that the 'module' version is faster. 如果你time在一个较小的循环range （例如for i in range(1000)你会发现，“模块”的版本速度更快。 This happens because the overhead introduced by needing to call function main() is larger than that introduced by *_FAST vs *_NAME differences. 这是因为需要调用函数main()引入的开销大于*_FAST vs *_NAME *_FAST差异引入的*_FAST 。 So it's largely relative to the amount of work that is done. 所以它在很大程度上取决于完成的工作量。

So, the real culprit here, and the reason why this difference is evident, is the for loop used. 所以，这里真正的罪魁祸首，以及这种差异显而易见的原因是使用了for循环。 You generally have 0 reason to ever put an intensive loop like that one at the top level of your script. 你通常有0理由在脚本的顶层放置一个密集的循环。 Move it in a function and avoid using global variables , it is designed to be more efficient. 在函数中移动它并避免使用全局变量 ，它被设计为更高效。

You can take a look at the code executed for each of the byte code. 您可以查看为每个字节代码执行的代码。 I'll link the source for the 3.5 version of Python here even though I'm pretty sure 2.7 doesn't differ much. 我会在这里链接3.5版Python的源代码，尽管我很确定2.7没有太大差别。 Bytecode evaluation is done in Python/ceval.c specifically in function PyEval_EvalFrameEx : 字节码评估在Python/ceval.c完成，特别是在函数PyEval_EvalFrameEx ：

LOAD_FAST source - STORE_FAST source LOAD_FAST source - STORE_FAST source
LOAD_NAME source - STORE_NAME source LOAD_NAME source - STORE_NAME source

As you'll see, the *_FAST bytecodes simply get the value stored/loaded using a fastlocals local symbol table contained inside frame objects . 正如您将看到的， *_FAST字节码只是使用框架对象中包含的fastlocals本地符号表来获取存储/加载的值。

全局与本地命名空间性能差异

问题描述

1 个解决方案

解决方案1
14 已采纳 2016-01-11 04:40:28

全局与本地命名空间性能差异

问题描述

1 个解决方案

解决方案1 14 已采纳 2016-01-11 04:40:28

解决方案1
14 已采纳 2016-01-11 04:40:28