简体   繁体   English

Python语法背后的基本原理

[英]Rationale behind Python's preferred for syntax

What is the rationale behind the advocated use of the for i in xrange(...) -style looping constructs in Python? 在Python中提倡使用for i in xrange(...) -style循环结构的原理是什么? For simple integer looping, the difference in overheads is substantial. 对于简单的整数循环,开销的差异很大。 I conducted a simple test using two pieces of code: 我使用两段代码进行了一个简单的测试:

File idiomatic.py : 文件idiomatic.py

#!/usr/bin/env python

M = 10000
N = 10000

if __name__ == "__main__":
    x, y = 0, 0
    for x in xrange(N):
        for y in xrange(M):
            pass

File cstyle.py : 文件cstyle.py

#!/usr/bin/env python

M = 10000
N = 10000

if __name__ == "__main__":
    x, y = 0, 0
    while x < N:
        while y < M:
            y += 1
        x += 1

Profiling results were as follows: 分析结果如下:

bash-3.1$ time python cstyle.py

real    0m0.109s
user    0m0.015s
sys     0m0.000s

bash-3.1$ time python idiomatic.py

real    0m4.492s
user    0m0.000s
sys     0m0.031s

I can understand why the Pythonic version is slower -- I imagine it has a lot to do with calling xrange N times, perhaps this could be eliminated if there was a way to rewind a generator. 我可以理解为什么Pythonic版本更慢 - 我想它与调用xrange N次有很大关系,如果有一种方法可以倒回生成器,也许这可以消除。 However, with this deal of difference in execution time, why would one prefer to use the Pythonic version? 但是,由于执行时间的这种差异,为什么人们更喜欢使用Pythonic版本?

Edit: I conducted the tests again using the code Mr. Martelli provided, and the results were indeed better now: 编辑:我使用Martelli先生提供的代码再次进行了测试,现在结果确实更好:

I thought I'd enumerate the conclusions from the thread here: 我以为我会在这里列举一下这个帖子的结论:

1) Lots of code at the module scope is a bad idea, even if the code is enclosed in an if __name__ == "__main__": block. 1) 模块范围内的大量代码是个坏主意,即使代码包含在if __name__ == "__main__": block中。

2) *Curiously enough, modifying the code that belonged to thebadone to my incorrect version (letting y grow without resetting) produced little difference in performance, even for larger values of M and N. 2)*奇怪的是,修改属于代码thebadone我的版本不正确(设为y成长过程中没有复位)产生的性能差别不大,甚至对于较大的M和N的值

Here's the proper comparison, eg in loop.py: 这是正确的比较,例如在loop.py中:

M = 10000
N = 10000

def thegoodone():
   for x in xrange(N):
       for y in xrange(M):
           pass

def thebadone():
    x = 0
    while x < N:
        y = 0
        while y < M:
            y += 1
        x += 1

All substantial code should always be in functions -- putting a hundred million loops at a module's top level shows reckless disregard for performance and makes a mockery of any attempts at measuring said performance. 所有重要的代码都应该始终存在于函数中 - 在模块的顶层放置一亿个循环表示不顾一切地忽略性能,并且嘲弄任何测量所述性能的尝试。

Once you've done that, you see: 完成后,您会看到:

$ python -mtimeit -s'import loop' 'loop.thegoodone()'
10 loops, best of 3: 3.45 sec per loop
$ python -mtimeit -s'import loop' 'loop.thebadone()'
10 loops, best of 3: 10.6 sec per loop

So, properly measured, the bad way that you advocate is about 3 times slower than the good way which Python promotes. 所以,正确衡量,你倡导的坏方法比Python推广的好方式慢约3倍。 I hope this makes you reconsider your erroneous advocacy. 我希望这会让你重新考虑你的错误宣传。

You forgot to reset y to 0 after the inner loop. 您忘记在内循环后将y重置为0。

#!/usr/bin/env python
M = 10000
N = 10000

if __name__ == "__main__":
    x, y = 0, 0
    while x < N:
        while y < M:
            y += 1
        x += 1
        y = 0

ed: 20.63s after fix vs. 6.97s using xrange 编辑:修复后的20.63s与使用xrange的6.97s

good for iterating over data structures 适合迭代数据结构

The for i in ... syntax is great for iterating over data structures. for i in ...语法非常适合迭代数据结构。 In a lower-level language, you would generally be iterating over an array indexed by an int, but with the python syntax you can eliminate the indexing step. 在较低级别的语言中,您通常会迭代一个由int索引的数组,但是使用python语法可以消除索引步骤。

this is not a direct answer to the question, but i want to open the dialog a bit more on xrange() . 这不是问题的直接答案,但我想在xrange()上更多地打开对话框。 two things: 两件事情:

A. there is something wrong with one of the OP statements that no one has corrected yet (yes, in addition to the bug in the code of not resetting y ): A.其中一个OP语句出现了问题,还没有人纠正过(是的,除了不重置y的代码中的错误):

"I imagine it has a lot to do with calling xrange N times...." “我想这与调用xrange N次有很大关系....”

unlike traditional counting for loops, Python's is more like a shell's foreach ... looping over an iterable. 与传统for循环计数不同,Python更像是一个shell的foreach ...循环遍历一个可迭代的循环。 therefore, xrange() is called exactly once , not "N times." 因此, xrange()只调用一次 ,而不是“N次”。

B. xrange() is the name of this function in Python 2. it replaces and is renamed to range() in Python 3, so keep this in mind when porting. B. xrange()是Python 2中此函数的名称。它替换并在Python 3中重命名为range() ,因此在移植时请记住这一点。 if you didn't know already, xrange() returns an iterator(-like object) while range() returns lists. 如果你还不知道, xrange()返回一个迭代器(类似于对象),而range()返回列表。 since the latter is more inefficient, it has been deprecated in favor of xrange() which is more memory-friendly. 由于后者效率较低,因此不推荐使用xrange() ,因为它对内存更友好。 the workaround in Python 3, for all those who need to have a list is list(range(N)) . Python 3中的解决方法,对于所有需要列表的人来说是list(range(N))

I've repeated the test from @Alex Martelli's answer . 我从@Alex Martelli的回答中重复了这个测试。 The idiomatic for loop is 5 times faster than the while loop: 惯用for循环比while循环快5倍:

python -mtimeit -s'from while_vs_for import while_loop as loop' 'loop(10000)'
10 loops, best of 3: 9.6 sec per loop
python -mtimeit -s'from while_vs_for import for_loop as loop'   'loop(10000)'
10 loops, best of 3: 1.83 sec per loop

while_vs_for.py : while_vs_for.py

def while_loop(N):
    x = 0
    while x < N:
        y = 0
        while y < N:
            pass
            y += 1
        x += 1

def for_loop(N):
    for x in xrange(N):
        for y in xrange(N):
            pass

At module level: 在模块级别:

$ time -p python for.py
real 4.38
user 4.37
sys 0.01
$ time -p python while.py
real 14.28
user 14.28
sys 0.01

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM