简体   繁体   English

列表理解和功能函数是否比“for 循环”更快?

[英]Are list-comprehensions and functional functions faster than "for loops"?

In terms of performance in Python, is a list-comprehension, or functions like map() , filter() and reduce() faster than a for loop?就 Python 的性能而言,列表理解或map()filter()reduce()之类的函数是否比 for 循环更快? Why, technically, they run in a C speed , while the for loop runs in the python virtual machine speed ?.为什么从技术上讲,它们以 C 的速度运行,而for 循环以 python 的虚拟机速度运行?。

Suppose that in a game that I'm developing I need to draw complex and huge maps using for loops.假设在我正在开发的游戏中,我需要使用 for 循环绘制复杂而巨大的地图。 This question would be definitely relevant, for if a list-comprehension, for example, is indeed faster, it would be a much better option in order to avoid lags (Despite the visual complexity of the code).这个问题肯定是相关的,因为如果列表理解确实更快,那么为了避免滞后,这将是一个更好的选择(尽管代码的视觉复杂性)。

The following are rough guidelines and educated guesses based on experience.以下是粗略的指导方针和基于经验的有根据的猜测。 You should timeit or profile your concrete use case to get hard numbers, and those numbers may occasionally disagree with the below.您应该timeit或分析您的具体用例以获得硬性数字,这些数字有时可能与以下内容不一致。

A list comprehension is usually a tiny bit faster than the precisely equivalent for loop (that actually builds a list), most likely because it doesn't have to look up the list and its append method on every iteration.列表理解通常比精确等效的for循环(实际上构建一个列表)快一点,很可能是因为它不必在每次迭代时查找列表及其append方法。 However, a list comprehension still does a bytecode-level loop:但是,列表推导式仍然执行字节码级别的循环:

>>> dis.dis(<the code object for `[x for x in range(10)]`>)
 1           0 BUILD_LIST               0
             3 LOAD_FAST                0 (.0)
       >>    6 FOR_ITER                12 (to 21)
             9 STORE_FAST               1 (x)
            12 LOAD_FAST                1 (x)
            15 LIST_APPEND              2
            18 JUMP_ABSOLUTE            6
       >>   21 RETURN_VALUE

Using a list comprehension in place of a loop that doesn't build a list, nonsensically accumulating a list of meaningless values and then throwing the list away, is often slower because of the overhead of creating and extending the list.由于创建和扩展列表的开销,使用列表推导式代替构建列表的循环、无意义地累积无意义值列表然后丢弃列表通常会较慢 List comprehensions aren't magic that is inherently faster than a good old loop.列表推导式并不是天生就比一个好的旧循环更快的魔法。

As for functional list processing functions: While these are written in C and probably outperform equivalent functions written in Python, they are not necessarily the fastest option.至于功能列表处理功能:虽然这些都是用C语言编写,并可能超越Python编写的相同的功能,它们不一定是最快的选择。 Some speed up is expected if the function is written in C too.如果该函数也是用 C 编写的,则预计会有一些加速。 But most cases using a lambda (or other Python function), the overhead of repeatedly setting up Python stack frames etc. eats up any savings.但是大多数情况下使用lambda (或其他 Python 函数),重复设置 Python 堆栈帧等的开销会消耗掉任何节省。 Simply doing the same work in-line, without function calls (eg a list comprehension instead of map or filter ) is often slightly faster.简单地在线做同样的工作,没有函数调用(例如列表理解而不是mapfilter )通常会稍微快一点。

Suppose that in a game that I'm developing I need to draw complex and huge maps using for loops.假设在我正在开发的游戏中,我需要使用 for 循环绘制复杂而巨大的地图。 This question would be definitely relevant, for if a list-comprehension, for example, is indeed faster, it would be a much better option in order to avoid lags (Despite the visual complexity of the code).这个问题肯定是相关的,例如,如果列表理解确实更快,那么为了避免滞后,这将是一个更好的选择(尽管代码的视觉复杂性)。

Chances are, if code like this isn't already fast enough when written in good non-"optimized" Python, no amount of Python level micro optimization is going to make it fast enough and you should start thinking about dropping to C. While extensive micro optimizations can often speed up Python code considerably, there is a low (in absolute terms) limit to this.很有可能,如果这样的代码在用良好的非“优化”Python 编写时还不够快,那么再多的 Python 级别的微优化都不会让它足够快,你应该开始考虑使用 C。虽然广泛微优化通常可以显着加快 Python 代码的速度,对此有一个较低的(绝对值)限制。 Moreover, even before you hit that ceiling, it becomes simply more cost efficient (15% speedup vs. 300% speed up with the same effort) to bite the bullet and write some C.此外,即使在你达到这个上限之前,咬紧牙关写一些 C 也变得更具成本效益(15% 加速比 300% 加速,同样的努力)。

If you check the info on python.org , you can see this summary:如果您查看python.org 上信息,您可以看到以下摘要:

Version Time (seconds)
Basic loop 3.47
Eliminate dots 2.45
Local variable & no dots 1.79
Using map function 0.54

But you really should read the above article in details to understand the cause of the performance difference.但是你真的应该详细阅读上面的文章来了解性能差异的原因。

I also strongly suggest you should time your code by using timeit .我还强烈建议您应该使用timeit 为您的代码计时。 At the end of the day, there can be a situation where, for example, you may need to break out of for loop when a condition is met.在一天结束时,可能会出现这样的情况,例如,您可能需要在满足条件时跳出for循环。 It could potentially be faster than finding out the result by calling map .它可能比通过调用map找出结果更快。

You ask specifically about map() , filter() and reduce() , but I assume you want to know about functional programming in general.您专门询问map()filter()reduce() ,但我假设您想了解一般的函数式编程。 Having tested this myself on the problem of computing distances between all points within a set of points, functional programming (using the starmap function from the built-in itertools module) turned out to be slightly slower than for-loops (taking 1.25 times as long, in fact).在计算一组点中所有点之间的距离的问题上自己对此进行了测试后,函数式编程(使用内置itertools模块中的starmap函数)结果证明比 for 循环稍慢(需要 1.25 倍的时间) , 实际上)。 Here is the sample code I used:这是我使用的示例代码:

import itertools, time, math, random

class Point:
    def __init__(self,x,y):
        self.x, self.y = x, y

point_set = (Point(0, 0), Point(0, 1), Point(0, 2), Point(0, 3))
n_points = 100
pick_val = lambda : 10 * random.random() - 5
large_set = [Point(pick_val(), pick_val()) for _ in range(n_points)]
    # the distance function
f_dist = lambda x0, x1, y0, y1: math.sqrt((x0 - x1) ** 2 + (y0 - y1) ** 2)
    # go through each point, get its distance from all remaining points 
f_pos = lambda p1, p2: (p1.x, p2.x, p1.y, p2.y)

extract_dists = lambda x: itertools.starmap(f_dist, 
                          itertools.starmap(f_pos, 
                          itertools.combinations(x, 2)))

print('Distances:', list(extract_dists(point_set)))

t0_f = time.time()
list(extract_dists(large_set))
dt_f = time.time() - t0_f

Is the functional version faster than the procedural version?功能版本比程序版本快吗?

def extract_dists_procedural(pts):
    n_pts = len(pts)
    l = []    
    for k_p1 in range(n_pts - 1):
        for k_p2 in range(k_p1, n_pts):
            l.append((pts[k_p1].x - pts[k_p2].x) ** 2 +
                     (pts[k_p1].y - pts[k_p2].y) ** 2)
    return l

t0_p = time.time()
list(extract_dists_procedural(large_set)) 
    # using list() on the assumption that
    # it eats up as much time as in the functional version

dt_p = time.time() - t0_p

f_vs_p = dt_p / dt_f
if f_vs_p >= 1.0:
    print('Time benefit of functional progamming:', f_vs_p, 
          'times as fast for', n_points, 'points')
else:
    print('Time penalty of functional programming:', 1 / f_vs_p, 
          'times as slow for', n_points, 'points')

I wrote a simple script that test the speed and this is what I found out.我写了一个简单的脚本来测试速度,这就是我发现的。 Actually for loop was fastest in my case.实际上 for 循环在我的情况下是最快的。 That really suprised me, check out bellow (was calculating sum of squares).这真的让我感到惊讶,请查看下面的内容(正在计算平方和)。

from functools import reduce
import datetime


def time_it(func, numbers, *args):
    start_t = datetime.datetime.now()
    for i in range(numbers):
        func(args[0])
    print (datetime.datetime.now()-start_t)

def square_sum1(numbers):
    return reduce(lambda sum, next: sum+next**2, numbers, 0)


def square_sum2(numbers):
    a = 0
    for i in numbers:
        i = i**2
        a += i
    return a

def square_sum3(numbers):
    sqrt = lambda x: x**2
    return sum(map(sqrt, numbers))

def square_sum4(numbers):
    return(sum([int(i)**2 for i in numbers]))


time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
0:00:00.302000 #Reduce
0:00:00.144000 #For loop
0:00:00.318000 #Map
0:00:00.390000 #List comprehension

I modified @Alisa's code and used cProfile to show why list comprehension is faster:我修改了cProfile的代码并使用cProfile来说明为什么列表理解更快:

from functools import reduce
import datetime

def reduce_(numbers):
    return reduce(lambda sum, next: sum + next * next, numbers, 0)

def for_loop(numbers):
    a = []
    for i in numbers:
        a.append(i*2)
    a = sum(a)
    return a

def map_(numbers):
    sqrt = lambda x: x*x
    return sum(map(sqrt, numbers))

def list_comp(numbers):
    return(sum([i*i for i in numbers]))

funcs = [
        reduce_,
        for_loop,
        map_,
        list_comp
        ]

if __name__ == "__main__":
    # [1, 2, 5, 3, 1, 2, 5, 3]
    import cProfile
    for f in funcs:
        print('=' * 25)
        print("Profiling:", f.__name__)
        print('=' * 25)
        pr = cProfile.Profile()
        for i in range(10**6):
            pr.runcall(f, [1, 2, 5, 3, 1, 2, 5, 3])
        pr.create_stats()
        pr.print_stats()

Here's the results:结果如下:

=========================
Profiling: reduce_
=========================
         11000000 function calls in 1.501 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.162    0.000    1.473    0.000 profiling.py:4(reduce_)
  8000000    0.461    0.000    0.461    0.000 profiling.py:5(<lambda>)
  1000000    0.850    0.000    1.311    0.000 {built-in method _functools.reduce}
  1000000    0.028    0.000    0.028    0.000 {method 'disable' of '_lsprof.Profiler' objects}


=========================
Profiling: for_loop
=========================
         11000000 function calls in 1.372 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.879    0.000    1.344    0.000 profiling.py:7(for_loop)
  1000000    0.145    0.000    0.145    0.000 {built-in method builtins.sum}
  8000000    0.320    0.000    0.320    0.000 {method 'append' of 'list' objects}
  1000000    0.027    0.000    0.027    0.000 {method 'disable' of '_lsprof.Profiler' objects}


=========================
Profiling: map_
=========================
         11000000 function calls in 1.470 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.264    0.000    1.442    0.000 profiling.py:14(map_)
  8000000    0.387    0.000    0.387    0.000 profiling.py:15(<lambda>)
  1000000    0.791    0.000    1.178    0.000 {built-in method builtins.sum}
  1000000    0.028    0.000    0.028    0.000 {method 'disable' of '_lsprof.Profiler' objects}


=========================
Profiling: list_comp
=========================
         4000000 function calls in 0.737 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1000000    0.318    0.000    0.709    0.000 profiling.py:18(list_comp)
  1000000    0.261    0.000    0.261    0.000 profiling.py:19(<listcomp>)
  1000000    0.131    0.000    0.131    0.000 {built-in method builtins.sum}
  1000000    0.027    0.000    0.027    0.000 {method 'disable' of '_lsprof.Profiler' objects}

IMHO:恕我直言:

  • reduce and map are in general pretty slow. reducemap通常很慢。 Not only that, using sum on the iterators that map returned is slow, compared to sum ing a list不仅如此,与对列表sum相比,在map返回的迭代器上使用sum很慢
  • for_loop uses append, which is of course slow to some extent for_loop使用 append,这当然在某种程度上很慢
  • list-comprehension not only spent the least time building the list, it also makes sum much quicker, in contrast to mapmap相比,list-comprehension 不仅在构建列表上花费的时间最少,而且使sum速度更快

Adding a twist to Alphii answer , actually the for loop would be second best and about 6 times slower than mapAlphii 答案添加一个转折,实际上 for 循环将是第二好的并且比map慢约 6 倍

from functools import reduce
import datetime


def time_it(func, numbers, *args):
    start_t = datetime.datetime.now()
    for i in range(numbers):
        func(args[0])
    print (datetime.datetime.now()-start_t)

def square_sum1(numbers):
    return reduce(lambda sum, next: sum+next**2, numbers, 0)


def square_sum2(numbers):
    a = 0
    for i in numbers:
        a += i**2
    return a

def square_sum3(numbers):
    a = 0
    map(lambda x: a+x**2, numbers)
    return a

def square_sum4(numbers):
    a = 0
    return [a+i**2 for i in numbers]

time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])

Main changes have been to eliminate the slow sum calls, as well as the probably unnecessary int() in the last case.主要的变化是消除了慢sum调用,以及最后一种情况下可能不必要的int() Putting the for loop and map in the same terms makes it quite fact, actually.实际上,将 for 循环和 map 放在相同的术语中使其成为事实。 Remember that lambdas are functional concepts and theoretically shouldn't have side effects, but, well, they can have side effects like adding to a .请记住,lambda表达式是功能性的概念,理论上不应该有副作用,但是,好了,他们可以有副作用,如增加了a Results in this case with Python 3.6.1, Ubuntu 14.04, Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz在这种情况下,Python 3.6.1、Ubuntu 14.04、Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz 的结果

0:00:00.257703 #Reduce
0:00:00.184898 #For loop
0:00:00.031718 #Map
0:00:00.212699 #List comprehension

I have managed to modify some of @alpiii's code and discovered that List comprehension is a little faster than for loop.我设法修改了@alpiii 的一些代码,发现列表理解比 for 循环快一点。 It might be caused by int() , it is not fair between list comprehension and for loop.这可能是由int()引起的,列表理解和 for 循环之间是不公平的。

from functools import reduce
import datetime

def time_it(func, numbers, *args):
    start_t = datetime.datetime.now()
    for i in range(numbers):
        func(args[0])
    print (datetime.datetime.now()-start_t)

def square_sum1(numbers):
    return reduce(lambda sum, next: sum+next*next, numbers, 0)

def square_sum2(numbers):
    a = []
    for i in numbers:
        a.append(i*2)
    a = sum(a)
    return a

def square_sum3(numbers):
    sqrt = lambda x: x*x
    return sum(map(sqrt, numbers))

def square_sum4(numbers):
    return(sum([i*i for i in numbers]))

time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
0:00:00.101122 #Reduce

0:00:00.089216 #For loop

0:00:00.101532 #Map

0:00:00.068916 #List comprehension

I was looking for some performance information regarding 'for' loops and 'list comprehension' and stumbled upon this topic.我正在寻找有关“for”循环和“列表理解”的一些性能信息,并偶然发现了这个主题。 It has been a few months since Python 3.11 release (October 2022) and one of the main features of Python 3.11 was speed improvements.自 Python 3.11 发布(2022 年 10 月)以来已经几个月了,Python 3.11 的主要功能之一是速度改进。 https://www.python.org/downloads/release/python-3110/ https://www.python.org/downloads/release/python-3110/

The Faster CPython Project is already yielding some exciting results. Faster CPython 项目已经产生了一些令人兴奋的结果。 Python 3.11 is up to 10-60% faster than Python 3.10. Python 3.11 比 Python 3.10 快 10-60%。 On average, we measured a 1.22x speedup on the standard benchmark suite.平均而言,我们测得标准基准套件的速度提高了 1.22 倍。 See Faster CPython for details.有关详细信息,请参阅更快的 CPython。

I ran the same code originally posted by Alphi and then "twisted" by jjmerelo.我运行了最初由 Alphi 发布的相同代码,然后由 jjmerelo 进行了“扭曲”。 Python3.10 and Python3.11 results below: Python3.10和Python3.11结果如下:

    from functools import reduce
    import datetime
    
    def time_it(func, numbers, *args):
        start_t = datetime.datetime.now()
        for i in range(numbers):
            func(args[0])
        print(datetime.datetime.now()-start_t)
    
    def square_sum1(numbers):
        return reduce(lambda sum, next: sum+next**2, numbers, 0)
    
    
    def square_sum2(numbers):
        a = 0
        for i in numbers:
            a += i**2
        return a
    
    
    def square_sum3(numbers):
        a = 0
        map(lambda x: a+x**2, numbers)
        return a
    
    
    def square_sum4(numbers):
        a = 0
        return [a+i**2 for i in numbers]
    
    
    time_it(square_sum1, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
    time_it(square_sum2, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
    time_it(square_sum3, 100000, [1, 2, 5, 3, 1, 2, 5, 3])
    time_it(square_sum4, 100000, [1, 2, 5, 3, 1, 2, 5, 3])

I haven't calculated the exact percentage improvement but it is clear that the performance gain - at least in this particular instance - seems to be impressive (3 to 4 times faster) with the exception of 'map' which has negligible performance improvement.我还没有计算出确切的百分比改进,但很明显,性能增益 - 至少在这个特定实例中 - 似乎令人印象深刻(快 3 到 4 倍),但“地图”的性能改进可以忽略不计。

#Python 3.10
0:00:00.221134  #Reduce
0:00:00.186307  #For
0:00:00.024311  #Map
0:00:00.206454  #List comprehension

#python3.11
0:00:00.072550  #Reduce
0:00:00.037168  #For
0:00:00.021702  #Map
0:00:00.058655  #List Comprehension

Note: I ran this on a Kali Linux VM running under Windows 11 using WSL.注意:我使用 WSL 在 Windows 11 下运行的 Kali Linux VM 上运行了这个。 I'm not sure if this code might perform even better if run natively (bare metal) on a Linux instance.我不确定如果在 Linux 实例上本机(裸机)运行此代码是否会执行得更好。

My Kali Linux VM specs below:我的 Kali Linux VM 规格如下:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   39 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          8
On-line CPU(s) list:             0-7
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Core(TM) i7-6700T CPU @ 2.80GHz
CPU family:                      6
Model:                           94
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
Stepping:                        3
BogoMIPS:                        5615.99
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi ept vpid ept_ad fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
Virtualization:                  VT-x
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       128 KiB (4 instances)
L1i cache:                       128 KiB (4 instances)
L2 cache:                        1 MiB (4 instances)
L3 cache:                        8 MiB (1 instance)
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Unknown: Dependent on hypervisor status
Vulnerability Tsx async abort:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM