简体   繁体   English

Python结构和执行时间

[英]Python Structure and Execution Time

I am working on an application and have a choice between the following two basic structures. 我正在开发一个应用程序,可以在以下两个基本结构之间进行选择。 I ran a small test to determine the difference in execution time between the two. 我进行了一个小测试,以确定两者之间执行时间的差异。 The second option is 758 times faster than the first option. 第二个选项比第一个选项快758倍。 What can I do to retain the structure of the first option with the execution speed of the second? 如何保持第二个选项的执行速度和第一个选项的结构? It will be very messy to code option two (my example is vastly scaled down). 对选项二进行编码将非常混乱(我的示例已按比例缩小)。 The application is not computationally intensive it just involves a lot of data and small computations. 该应用程序不占用大量计算资源,它只涉及大量数据和少量计算。 I don't want to get into PyPy or Cython. 我不想进入PyPy或Cython。

What is the causing the first option to run slow relative to the second? 是什么导致第一个选项相对于第二个选项运行缓慢? Is it the creation of the list c[] ten thousand times? 一万次创建列表c []? I thought the Python interpreter was smart enough to anticipate. 我认为Python解释器足够聪明,可以预期。 Or is it the call of def b()? 还是def b()的调用?

Option 1: 选项1:

#! /usr/bin/env python

def a(b):

    c=[10 for i in range(10000)]
    return c[b]

def b():
    i=0
    while i < 10000:
        d = a(i)**a(i)^a(i)**a(i)
        i += 1

b()

Option 2: 选项2:

#! /usr/bin/env python

a=[10 for i in range(10000)]

i=0
while i < 10000:
    d = a[i]**a[i]^a[i]**a[i]
    i += 1

This will give you the same result, and the structure is similar to option 1. This will be far more efficient than option 1. 这将为您提供相同的结果,并且其结构类似于选项1。这将比选项1高效得多。

a = [10 for i in range(10000)]

def b():
    for c in a:
        d = c ** c ^ c ** c

b()

Option 1 is slower than option 2 because you are repeatedly calling a which tends to slow the execution and is not very efficient. 选项1的速度比选项2的速度慢,因为您反复调用a会降低执行速度,并且效率不高。

You give the answer in your question. 您在问题中给出答案。 In option one you create the list c for each and every call to a . 在选择一个您创建的每个列表C和每次调用a Python does not have the concept of static variables in function and the interpreter/JIT will most likely not optimize that away. Python在函数中没有静态变量的概念,而解释器/ JIT很可能不会对其进行优化。 You have several options to move list c out of the scope of a to prevent a recreation. 您有几种选择可以将列表c移出a的范围以防止重新创建。

Option 1 - Global variable 选项1-全局变量

c=[10 for i in range(10000)]

def a(b):
    return c[b]

def b():
    i=0
    while i < 10000:
        d = a(i)**a(i)^a(i)**a(i)
        i += 1

b()

Global variables are generally not very nice, though. 但是,全局变量通常不是很好。 At this point the point of function a is also very limited and you might as well remove it. 此时,功能点a也非常有限,您最好将其删除。 Never expect a JIT to do optimizations as good as good as an AOT compiler who would very likely inline a . 永远不要指望JIT能够像AOT编译器那样出色地进行优化,而AOT编译器很可能会内联a

Option 2 - A function closure 选项2-函数关闭

def createA():
    c=[10 for i in range(10000)]
    def a(b):
        return c[b]
    return a

a = createA()

def b():
    i=0
    while i < 10000:
        d = a(i)**a(i)^a(i)**a(i)
        i += 1

b()

This is very JavaScript-esque and would much nicer if Python had a more extensive support for anonymous functions. 这非常像JavaScript,如果Python对匿名函数有更广泛的支持,那就更好了。 This doesn't look very nice, either, since now you pollute the global scope with another function. 看起来也不是很好,因为现在您使用另一个函数来污染全局范围。

Option 3 - Functors 选项3-函子

class A(object):
    def __init__(self):
        self._c = [10 for i in range(10000)]

    def __call__(self, b):
        return self._c[b]

a = A()

def b():
    i=0
    while i < 10000:
        d = a(i)**a(i)^a(i)**a(i)
        i += 1

b()

Note that you are adding the class A to the global namespace. 请注意,您正在将类A添加到全局名称空间。

Option 4 - Set the "look up list" on the function 选项4-在功能上设置“查找列表”

def a(b):
    return a._c[b]

a._c=[10 for i in range(10000)]

def b():
    i=0
    while i < 10000:
        d = a(i)**a(i)^a(i)**a(i)
        i += 1

b()

Here you are retaining structure, not polluting the global namespace and not recreating the list on each call. 在这里,您将保留结构,不会污染全局名称空间,也不会在每次调用时重新创建列表。 This is basically as close as you get to static function variables in Python. 基本上,这和您在Python中获取静态函数变量一样近。 If a will be very simple you still want to consider inlining the functionality for performance reasons. 如果a非常简单,出于性能原因,您仍要考虑内联功能。

As you mentioned, recreating list 'c' for 10k times is the culprit. 正如您提到的,罪魁祸首是重新创建列表'c'1万次。 From system point of running this code via ltrace shows below 从系统角度来看,通过ltrace运行此代码如下所示

$ ltrace -fc  /usr/bin/python ./opt1.py 2>&1 |head -5
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 59.08    7.751628          71    108965 realloc
  8.33    1.093207          69     15651 memset
  7.35    0.964546          69     13825 memcpy

$ ltrace -fc  /usr/bin/python ./opt2.py 2>&1 |head -5
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 28.37    0.973063          70     13797 memcpy
 15.58    0.534149          70      7615 memset
 14.20    0.487080          69      6975 strlen

python is calling realloc every time you extend list and thus causing slowness observed. 每次扩展列表时,python都会调用realloc ,从而导致观察到缓慢。 I ran test with version 2.7.9 of python. 我使用python 2.7.9版本进行了测试。 Profiling within python would be the best thing to check further, but i just used quick/dirty way to look 在python中进行概要分析是最好的进一步检查方法,但是我只是使用了快速/肮脏的方式来查找

PS: my test was reduced to 1k iterations only PS:我的测试仅减少到1k次迭代

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM