简体   繁体   English

使用Python在Windows上实现并发/并行

[英]Concurrency/Parallelism on Windows with Python

I developed simple program to solve eight queens problem. 我开发了简单的程序来解决八个皇后问题。 Now I would like to do some more testing with different meta-parameters so I would like to make it fast. 现在我想用不同的元参数做更多的测试,所以我想快速完成。 I went through a few iterations of profiling and was able to cut runtime significantly but I reached the point where I believe only parts of computations concurrently could make it faster. 我经历了几次分析迭代,并且能够显着减少运行时间,但我达到了我认为只有部分计算同时可以使它更快的程度。 I tried to use multiprocessing and concurrent.futures modules but it did not improve runtime a lot and in some cases even slowed down execution. 我尝试使用multiprocessingconcurrent.futures模块,但它并没有大大改善运行时间,在某些情况下甚至减慢了执行速度。 That is to just give some context. 那只是给出一些背景。

I was able to come up with similar code structure where sequential version beats concurrent. 我能够提出类似的代码结构,其中顺序版本节拍并发。

import numpy as np
import concurrent.futures
import math
import time
import multiprocessing

def is_prime(n):
    if n % 2 == 0:
        return False

    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

def generate_data(seed):
    np.random.seed(seed)
    numbers = []
    for _ in range(5000):
        nbr = np.random.randint(50000, 100000)
        numbers.append(nbr)
    return numbers

def run_test_concurrent(numbers):
    print("Concurrent test")
    start_tm = time.time()
    chunk = len(numbers)//3
    primes = None
    with concurrent.futures.ProcessPoolExecutor(max_workers=3) as pool:
        primes = list(pool.map(is_prime, numbers, chunksize=chunk))
    print("Time: {:.6f}".format(time.time() - start_tm))
    print("Number of primes: {}\n".format(np.sum(primes)))


def run_test_sequential(numbers):
    print("Sequential test")
    start_tm = time.time()
    primes = [is_prime(nbr) for nbr in numbers]
    print("Time: {:.6f}".format(time.time() - start_tm))
    print("Number of primes: {}\n".format(np.sum(primes)))


def run_test_multiprocessing(numbers):
    print("Multiprocessing test")
    start_tm = time.time()
    chunk = len(numbers)//3
    primes = None
    with multiprocessing.Pool(processes=3) as pool:
        primes = list(pool.map(is_prime, numbers, chunksize=chunk))
    print("Time: {:.6f}".format(time.time() - start_tm))
    print("Number of primes: {}\n".format(np.sum(primes)))


def main():
    nbr_trails = 5
    for trail in range(nbr_trails):
        numbers = generate_data(trail*10)
        run_test_concurrent(numbers)
        run_test_sequential(numbers)
        run_test_multiprocessing(numbers)
        print("--\n")


if __name__ == '__main__':
    main()

When I run it on my machine - Windows 7, Intel Core i5 with four cores I got the following output: 当我在我的机器上运行它 - Windows 7,带有四个核心的英特尔酷睿i5时,我得到了以下输出:

Concurrent test
Time: 2.006006
Number of primes: 431

Sequential test
Time: 0.010000
Number of primes: 431

Multiprocessing test
Time: 1.412003
Number of primes: 431
--

Concurrent test
Time: 1.302003
Number of primes: 447

Sequential test
Time: 0.010000
Number of primes: 447

Multiprocessing test
Time: 1.252003
Number of primes: 447
--

Concurrent test
Time: 1.280002
Number of primes: 446

Sequential test
Time: 0.010000
Number of primes: 446

Multiprocessing test
Time: 1.250002
Number of primes: 446
--

Concurrent test
Time: 1.260002
Number of primes: 446

Sequential test
Time: 0.010000
Number of primes: 446

Multiprocessing test
Time: 1.250002
Number of primes: 446
--

Concurrent test
Time: 1.282003
Number of primes: 473

Sequential test
Time: 0.010000
Number of primes: 473

Multiprocessing test
Time: 1.260002
Number of primes: 473
--

The question that I have is whether I can make it somehow faster by running it concurrently on Windows with Python 3.6.4 |Anaconda, Inc.| 我有现在的问题是我是否可以通过同时运行在Windows上获得某种更快Python 3.6.4 |Anaconda, Inc.| . I read here on SO ( Why is creating a new process more expensive on Windows than Linux? ) that creating new processes on Windows is expensive. 我在这里阅读SO( 为什么在Windows上创建比Linux更昂贵的新流程? ),在Windows上创建新流程的成本很高。 Is there anything that can be done to speed things up? 有什么办法可以加快速度吗? Am I missing something obvious? 我错过了一些明显的东西吗

I also tried to create Pool only once but it did not seem to help a lot. 我也尝试过只创建一次Pool ,但似乎没什么帮助。


Edit: 编辑:

The original code structure looks more or less like: 原始代码结构看起来或多或少像:

My code is structure more or less like this: 我的代码结构或多或少像这样:

class Foo(object):

    def g() -> int:
        # function performing simple calculations
        # single function call is fast (~500 ms)
        pass


def run(self):
    nbr_processes = multiprocessing.cpu_count() - 1

    with multiprocessing.Pool(processes=nbr_processes) as pool:
        foos = get_initial_foos()

        solution_found = False
        while not solution_found:
            # one iteration
            chunk = len(foos)//nbr_processes
            vals = list(pool.map(Foo.g, foos, chunksize=chunk))

            foos = modify_foos()

with foos having 1000 elements. foos1000元素。 It is not possible to tell in advance how quickly algorithm converge and how many iterations are executed, possibly thousands. 不可能事先告诉算法收敛的速度和执行的迭代次数,可能是数千次。

Processes are much more lightweight under UNIX variants. UNIX变体下的进程更轻量级。 Windows processes are heavy and take much more time to start up. Windows进程很繁重,需要更多时间才能启动。 Threads are the recommended way of doing multiprocessing on windows. 线程是在Windows上进行多处理的推荐方法。 You can also follow this thread as well: Why is creating a new process more expensive on Windows than Linux? 您也可以关注此主题: 为什么在Windows上创建比Linux更昂贵的新流程?

Your setup is not really fair to multiprocessing. 您的设置对多处理来说并不公平。 You even included unnecessary primes = None assignments. 你甚至包括不必要的primes = None分配。 ;) ;)

Some points: 一些要点:


Data size 数据大小

Your generated data is way to litte to allow the overhead of process creation to be earned back. 您生成的数据可以用于获取流程创建的开销。 Try with range(1_000_000) instead of range(5000) . 尝试使用range(1_000_000)而不是range(5000) On Linux with multiprocessing.start_method set to 'spawn' (default on Windows) this draws a different picture: 在Linux上将multiprocessing.start_method设置为'spawn'(在Windows上为默认值),这将绘制不同的图片:

Concurrent test
Time: 0.957883
Number of primes: 89479

Sequential test
Time: 1.235785
Number of primes: 89479

Multiprocessing test
Time: 0.714775
Number of primes: 89479

Reuse your pool 重新使用你的游泳池

Don't leave the with-block of the pool as long you have left any code in your program you want to parallelize later. 只要在程序中留下要稍后并行化的代码,就不要离开池的with-block。 If you create the pool only once at the beginning, it doesn't make much sense including the pool-creation into your benchmark at all. 如果您在开始时仅创建一次池,则根本不包括将池创建到基准测试中。


Numpy NumPy的

Numpy is in parts able to release the global interpreter lock ( GIL ). Numpy部分能够发布全局解释器锁( GIL )。 This means, you can benefit from multi-core parallelism without the overhead of process creation. 这意味着,您可以从多核并行性中受益,而无需创建进程的开销。 If you're doing math anyway, try to utilize numpy as much as possible. 无论如何,如果你正在做数学,尽量使用numpy。 Try concurrent.futures.ThreadPoolExecutor and multiprocessing.dummy.Pool with code using numpy. 尝试使用numpy的concurrent.futures.ThreadPoolExecutormultiprocessing.dummy.Pool代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM