简体   繁体   English

Python 和列表推导的性能

[英]Python and performance of list comprehensions

Suppose you have got a list comprehension in python, like假设你在 python 中有一个列表理解,比如

Values = [ f(x) for x in range( 0, 1000 ) ]

with f being just a function without side effects. f 只是一个没有副作用的 function。 So all the entries can be computed independently.所以所有的条目都可以独立计算。

Is Python able to increase the performance of this list comprehension compared with the "obvious" implementation;与“明显”实现相比,Python 是否能够提高此列表理解的性能; eg by shared-memory-parallelization on multicore CPUs?例如,通过多核 CPU 上的共享内存并行化?

No, Python will not magically parallelize this for you.不,Python 不会神奇地为您并行化。 In fact, it can't, since it cannot prove the independence of the entries;事实上,它不能,因为它不能证明条目的独立性; that would require a great deal of program inspection/verification, which is impossible to get right in the general case.这将需要大量的程序检查/验证,这在一般情况下是不可能做到的。

If you want quick coarse-grained multicore parallelism, I recommend joblib instead:如果你想要快速的粗粒度多核并行性,我推荐使用joblib

from joblib import delayed, Parallel
values = Parallel(n_jobs=NUM_CPUS)(delayed(f)(x) for x in range(1000))

Not only have I witnessed near-linear speedups using this library, it also has the great feature of signals such as the one from Ctrl-C onto its worker processes, which cannot be said of all multiprocess libraries.我不仅见证了使用这个库的近线性加速,而且它还具有信号的强大功能,例如从 Ctrl-C 到其工作进程的信号,这是所有多进程库所不能说的。

Note that joblib doesn't really support shared-memory parallelism: it spawns worker processes , not threads, so it incurs some communication overhead from sending data to workers and results back to the master process.请注意,joblib 并不真正支持共享内存并行性:它产生工作进程,而不是线程,因此在将数据发送到工作进程并将结果返回到主进程时会产生一些通信开销。

In Python 3.2 they added concurrent.futures , a nice library for solving problems concurrently.在 Python 3.2 中,他们添加了concurrent.futures ,这是一个用于同时解决问题的好库。 Consider this example:考虑这个例子:

import math, time
from concurrent import futures

PRIMES = [112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419, 112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419]

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def bench(f):
    start = time.time()
    f()
    elapsed = time.time() - start
    print("Completed in {} seconds".format(elapsed))

def concurrent():
    with futures.ProcessPoolExecutor() as executor:
        values = list(executor.map(is_prime, PRIMES))

def listcomp():
    values = [is_prime(x) for x in PRIMES]

Results on my quad core:我的四核结果:

>>> bench(listcomp)
Completed in 14.463825941085815 seconds
>>> bench(concurrent)
Completed in 3.818351984024048 seconds

Try if the following can be faster:尝试以下是否可以更快:

Values = map(f,range(0,1000))

That's a functionnal manner to code这是一种功能性的编码方式

Another idea is to replace all occurences of Values in the code by the generator expression另一个想法是用生成器表达式替换代码中所有出现的

imap(f,range(0,1000))  # Python < 3

map(f,range(0,1000))  # Python 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM