python concurrent.futures.ProcessPoolExecutor：.submit（）vs .map（）的性能

Question

I am using concurrent.futures.ProcessPoolExecutor to find the occurrence of a number from a number range. 我使用concurrent.futures.ProcessPoolExecutor来查找数字范围内的数字的出现。 The intent is to investigate the amount of speed-up performance gained from concurrency. 目的是调查从并发中获得的加速性能的数量。 To benchmark performance, I have a control - a serial code to perform said task (shown below). 为了测试性能，我有一个控件 - 一个执行所述任务的串行代码（如下所示）。 I have written 2 concurrent codes, one using concurrent.futures.ProcessPoolExecutor.submit() and the other using concurrent.futures.ProcessPoolExecutor.map() to perform the same task. 我编写了2个并发代码，一个使用concurrent.futures.ProcessPoolExecutor.submit() ，另一个使用concurrent.futures.ProcessPoolExecutor.map()执行相同的任务。 They are shown below. 它们如下所示。 Advice on drafting the former and latter can be seen here and here , respectively. 关于起草前者和后者的建议可分别在这里和这里看到。

The task issued to all three codes was to find the number of occurrences of the number 5 in the number range of 0 to 1E8. 发给所有三个代码的任务是在0到1E8的数字范围内找到数字5的出现次数。 Both .submit() and .map() were assigned 6 workers, and .map() had a chunksize of 10,000. .submit()和.map()都分配了6个worker，而.map()的chunksize为10,000。 The manner to discretise the workload were identical in the concurrent codes. 在并发代码中，分离工作负载的方式是相同的。 However, the function used to find occurrences in both codes were different. 但是，用于在两个代码中查找出现的函数是不同的。 This was because the way arguments were passed to a function called by .submit() and .map() were different. 这是因为参数传递给.submit()和.map()调用的函数的方式不同。

All 3 codes reported the same number of occurrences, ie 56,953,279 times. 所有3个代码报告的发生次数相同，即56,953,279次。 However, the time taken to complete the task were very different. 但是，完成任务所需的时间非常不同。 .submit() performed 2 times faster than the control while .map() took twice as long as the control to complete it's task. .submit()执行速度比控件快2倍，而.map()花费的时间是控件完成任务的两倍。

Questions: 问题：

I would like to know if the slow performance of .map() is an artifact of my coding or it is inherently slow?" If the former, how can I improve it. I am just surprise that it performed slower than the control as there will be no much incentive to use it. 我想知道.map()的缓慢性能是否是我编码的一个工件，或者它本身就很慢？“如果是前者，我怎么能改进它。我只是惊讶于它执行速度比控件慢没有太大的动力去使用它。
I like to know if there is anyway to make .submit() code perform even faster. 我想知道是否有使.submit()代码执行得更快。 A condition I have is that the function _concurrent_submit() must return an iterable with the numbers/occurrences containing the number 5. 我有一个条件是函数_concurrent_submit()必须返回一个包含数字5的数字/出现次数的iterable。

Benchmark Results 基准测试结果

concurrent.futures.ProcessPoolExecutor.submit() concurrent.futures.ProcessPoolExecutor.submit（）

#!/usr/bin/python3.5
# -*- coding: utf-8 -*-

import concurrent.futures as cf
from time import time
from traceback import print_exc

def _findmatch(nmin, nmax, number):
    '''Function to find the occurrence of number in range nmin to nmax and return
       the found occurrences in a list.'''
    print('\n def _findmatch', nmin, nmax, number)
    start = time()
    match=[]
    for n in range(nmin, nmax):
        if number in str(n):
            match.append(n)
    end = time() - start
    print("found {0} in {1:.4f}sec".format(len(match),end))
    return match

def _concurrent_submit(nmax, number, workers):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.submit to
       find the occurences of a given number in a number range in a parallelised
       manner.'''
    # 1. Local variables
    start = time()
    chunk = nmax // workers
    futures = []
    found =[]
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        for i in range(workers):
            cstart = chunk * i
            cstop = chunk * (i + 1) if i != workers - 1 else nmax
            futures.append(executor.submit(_findmatch, cstart, cstop, number))
        # 2.2. Instruct workers to process results as they come, when all are
        #      completed or .....
        cf.as_completed(futures) # faster than cf.wait()
        # 2.3. Consolidate result as a list and return this list.
        for future in futures:
            for f in future.result():
                try:
                    found.append(f)
                except:
                    print_exc()
        foundsize = len(found)
        end = time() - start
        print('within statement of def _concurrent_submit():')
        print("found {0} in {1:.4f}sec".format(foundsize, end))
    return found

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.
    workers = 6     # Pool of workers

    start = time()
    a = _concurrent_submit(nmax, number, workers)
    end = time() - start
    print('\n main')
    print('workers = ', workers)
    print("found {0} in {1:.4f}sec".format(len(a),end))

concurrent.futures.ProcessPoolExecutor.map() concurrent.futures.ProcessPoolExecutor.map（）

#!/usr/bin/python3.5
# -*- coding: utf-8 -*-

import concurrent.futures as cf
import itertools
from time import time
from traceback import print_exc

def _findmatch(listnumber, number):    
    '''Function to find the occurrence of number in another number and return
       a string value.'''
    #print('def _findmatch(listnumber, number):')
    #print('listnumber = {0} and ref = {1}'.format(listnumber, number))
    if number in str(listnumber):
        x = listnumber
        #print('x = {0}'.format(x))
        return x 

def _concurrent_map(nmax, number, workers):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
       find the occurrences of a given number in a number range in a parallelised
       manner.'''
    # 1. Local variables
    start = time()
    chunk = nmax // workers
    futures = []
    found =[]
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        for i in range(workers):
            cstart = chunk * i
            cstop = chunk * (i + 1) if i != workers - 1 else nmax
            numberlist = range(cstart, cstop)
            futures.append(executor.map(_findmatch, numberlist,
                                        itertools.repeat(number),
                                        chunksize=10000))
        # 2.3. Consolidate result as a list and return this list.
        for future in futures:
            for f in future:
                if f:
                    try:
                        found.append(f)
                    except:
                        print_exc()
        foundsize = len(found)
        end = time() - start
        print('within statement of def _concurrent(nmax, number):')
        print("found {0} in {1:.4f}sec".format(foundsize, end))
    return found

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.
    workers = 6     # Pool of workers

    start = time()
    a = _concurrent_map(nmax, number, workers)
    end = time() - start
    print('\n main')
    print('workers = ', workers)
    print("found {0} in {1:.4f}sec".format(len(a),end))

Serial Code: 串行码：

#!/usr/bin/python3.5
# -*- coding: utf-8 -*-

from time import time

def _serial(nmax, number):    
    start = time()
    match=[]
    nlist = range(nmax)
    for n in nlist:
        if number in str(n):match.append(n)
    end=time()-start
    print("found {0} in {1:.4f}sec".format(len(match),end))
    return match

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.

    start = time()
    a = _serial(nmax, number)
    end = time() - start
    print('\n main')
    print("found {0} in {1:.4f}sec".format(len(a),end))

Update 13th Feb 2017: 2017年2月13日更新：

In addition to @niemmi answer, I have provide an answer following some personal research to show: 除了@niemmi的答案，我还提供了一些个人研究后的答案：

how to further speed-up @niemmi's .map() and .submit() solutions, and 如何进一步加速@niemmi的.map()和.submit()解决方案，以及
when ProcessPoolExecutor.map() can led to more speed-up than ProcessPoolExecutor.submit() . 当ProcessPoolExecutor.map()可以导致比ProcessPoolExecutor.submit()更快的速度。

Answer 1

Overview: 概述：

There are 2 parts to my answer: 我的回答分为两部分：

Part 1 shows how to gain more speed-up from @niemmi's ProcessPoolExecutor.map() solution. 第1部分展示了如何从@ niemmi的ProcessPoolExecutor.map()解决方案中获得更多的加速。
Part 2 shows when the ProcessPoolExecutor 's subclasses .submit() and .map() yield non-equivalent compute times. 第2部分显示ProcessPoolExecutor的子类.submit()和.map() .submit()产生非等效的计算时间。

======================================================================= ================================================== =====================

Part 1: More Speed-up for ProcessPoolExecutor.map() 第1部分：ProcessPoolExecutor.map（）的更多加速

Background: This section builds on @niemmi's .map() solution, which by itself is excellent. 背景：本节以@niemmi的.map()解决方案为基础，该解决方案本身就非常出色。 While doing some research on his discretization scheme to better understand how that interact with .map() chunksizes arguement, I found this interesting solution. 在对他的离散化方案进行一些研究以更好地理解如何与.map（）chunksizes争论进行交互时，我发现了这个有趣的解决方案。

I regard @niemmi's definition of chunk = nmax // workers to be a definition for chunksize, ie a smaller size of actual number range (given task) to be tackled by each worker in the worker pool. 我认为@niemmi对chunk = nmax // workers的定义是chunksize的定义，即工作池中每个worker要处理的实际数字范围（给定任务）的较小大小。 Now, this definition is premised on the assumption that if a computer has x number of workers, dividing the task equally among each worker will result in optimum use of each worker and hence the total task will be completed fastest. 现在，这个定义的前提是假设一台计算机有x个工人，在每个工人之间平均分配任务将导致每个工人的最佳使用，因此总任务将最快完成。 Therefore, the number of chunks to break up a given task into should always equal the number of pool workers. 因此，分解给定任务的块数应始终等于池工作者的数量。 However, is this assumption correct? 但是，这个假设是否正确？

Proposition: Here, I propose that the above assumption does not always lead to the fastest compute time when used with ProcessPoolExecutor.map() . 命题：在此，我建议上述假设与ProcessPoolExecutor.map()一起使用时并不总是导致最快的计算时间。 Rather, discretising a task to an amount greater than the number of pool workers can lead to speed-up, ie faster completion of a given task . 相反， 将任务分离到大于池工作者数量的量可以导致加速，即更快地完成给定任务 。

Experiment: I have modified @niemmi's code to allow the number of discretized tasks to exceed the number of pool workers. 实验：我修改了@ niemmi的代码，允许离散任务的数量超过池工作者的数量。 This code is given below and used to fin the number of times the number 5 appears in the number range of 0 to 1E8. 下面给出了该代码，用于表示数字5出现在0到1E8的数字范围内的次数。 I have executed this code using 1, 2, 4, and 6 pool workers and for various ratio of number of discretized tasks vs the number of pool workers. 我已经使用1,2,4和6个池工作者以及离散任务数量与池工作者数量的不同比率执行此代码。 For each scenario, 3 runs were made and the compute times were tabulated. 对于每个方案，进行了3次运行并将计算时间制成表格。 " Speed-up " is defined here as the average compute time using equal number of chunks and pool workers over the average compute time of when the number of discretized tasks is greater than the number of pool workers. “ 加速 ”在此定义为使用相等数量的块和池工作者的平均计算时间，而不是离散任务数大于池工作者数的平均计算时间。

Findings: 发现：

Figure on left shows the compute time taken by all the scenarios mentioned in the experiment section. 左图显示了实验部分中提到的所有方案所花费的计算时间。 It shows that the compute time taken by number of chunks / number of workers = 1 is always greater than the compute time taken by number of chunks > number of workers . 它表明， 块数/工人数= 1所需的计算时间总是大于块数>工人数所需的计算时间。 That is, the former case is always less efficient than the latter. 也就是说，前一种情况总是效率低于后者。
Figure on right shows that a speed-up of 1.2 times or more was gained when the number of chunks / number of workers reach a threshold value of 14 or more . 右图显示， 当块数/工人数达到阈值14或更高时，获得了1.2倍或更多的加速 。 It is interesting to observe that the speed-up trend also occurred when ProcessPoolExecutor.map() was executed with 1 worker. 有趣的是，当使用1个worker执行ProcessPoolExecutor.map()时，也会出现加速趋势。

Conclusion: When customizing the number of discrete tasks that ProcessPoolExecutor.map()` should use to solve a given task, it is prudent to ensure that this number is greater than the number pool workers as this practice shortens compute time. 结论：在自定义ProcessPoolExecutor.map（）`应该用于解决给定任务的离散任务的数量时，谨慎的做法是确保此数字大于池工作者的数量，因为这种做法缩短了计算时间。

concurrent.futures.ProcessPoolExecutor.map() code. concurrent.futures.ProcessPoolExecutor.map（）代码。 (revised parts only) （仅限修订部分）

def _concurrent_map(nmax, number, workers, num_of_chunks):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
       find the occurrences of a given number in a number range in a parallelised
       manner.'''
    # 1. Local variables
    start = time()
    chunksize = nmax // num_of_chunks
    futures = []
    found =[]
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        cstart = (chunksize * i for i in range(num_of_chunks))
        cstop = (chunksize * i if i != num_of_chunks else nmax
                 for i in range(1, num_of_chunks + 1))
        futures = executor.map(_findmatch, cstart, cstop,
                               itertools.repeat(number))
        # 2.2. Consolidate result as a list and return this list.
        for future in futures:
            #print('type(future)=',type(future))
            for f in future:
                if f:
                    try:
                        found.append(f)
                    except:
                        print_exc()
        foundsize = len(found)
        end = time() - start
        print('\n within statement of def _concurrent(nmax, number):')
        print("found {0} in {1:.4f}sec".format(foundsize, end))
    return found

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.
    workers = 4     # Pool of workers
    chunks_vs_workers = 14 # A factor of =>14 can provide optimum performance  
    num_of_chunks = chunks_vs_workers * workers

    start = time()
    a = _concurrent_map(nmax, number, workers, num_of_chunks)
    end = time() - start
    print('\n main')
    print('nmax={}, workers={}, num_of_chunks={}'.format(
          nmax, workers, num_of_chunks))
    print('workers = ', workers)
    print("found {0} in {1:.4f}sec".format(len(a),end))

======================================================================= ================================================== =====================

Part 2: Total compute time from using ProcessPoolExecutor subclasses .submit() and .map() can be dissimilar when returning a sorted/ordered result list. 第2部分：使用ProcessPoolExecutor子类时的总计算时间.submit（）和.map（）在返回排序/排序结果列表时可能不同。

Background: I have amended both the .submit() and .map() codes to allow an "apple-to-apple" comparison of their compute time and the ability to visualize the compute time of the main code, the compute time of the _concurrent method called by the main code to performs the concurrent operations, and the compute time for each discretized task/worker called by the _concurrent method. 背景：我修改了.submit()和.map()代码，允许对他们的计算时间进行“苹果到苹果”比较，并能够可视化主代码的计算时间，计算时间。主代码调用的_concurrent方法执行并发操作，以及_concurrent方法调用的每个离散任务/ worker的计算时间。 Furthermore, the concurrent method in these codes was structured to return an unordered and ordered list of the result directly from the future object of .submit() and the iterator of .map() . 此外，这些代码中的并发方法被构造为直接从.submit()的未来对象和.map()的迭代器返回结果的无序和有序列表。 Source code is provided below ( Hope it helps you. ). 源代码如下（ 希望它可以帮助您。 ）。

Experiments These two newly improved codes were used to perform the same experiment described in Part 1, save that only 6 pool workers were considered and the python built-in list and sorted methods were used to return an unordered and ordered list of the results to the main section of the code, respectively. 实验这两个新改进的代码用于执行第1部分中描述的相同实验，除了只考虑了6个池工作者，并使用python内置list和sorted方法将结果的无序和有序列表返回给代码的主要部分。

Findings: 发现：

From the _concurrent method's result, we can see the compute times of the _concurrent method used to create all the Future objects of ProcessPoolExecutor.submit() , and to create the iterator of ProcessPoolExecutor.map() , as a function of the number of discretized task over the number of pool workers, are equivalent. 从_concurrent方法的结果中，我们可以看到_concurrent方法的计算时间，用于创建ProcessPoolExecutor.submit()所有Future对象，并创建ProcessPoolExecutor.map()的迭代器，作为离散化数量的函数。任务超过池工人数，都是等价的。 This result simply means that the ProcessPoolExecutor sub-classes .submit() and .map() are equally efficient/fast. 这个结果只是意味着ProcessPoolExecutor子类.submit()和.map()同样有效/快速。
Comparing the compute times from main and it's _concurrent method, we can see that main ran longer than it's _concurrent method. 比较main和它的_concurrent方法的计算时间，我们可以看到main比它的_concurrent方法运行得更长。 This is to be expected as their time difference reflects the amount of compute times of the list and sorted methods (and that of the other methods encased within these methods). 这是预期的，因为它们的时间差反映了list和已sorted方法（以及这些方法中包含的其他方法的计算时间）的计算时间量。 Clearly seen, the list method took less compute time to return a result list than the sorted method. 很明显， list方法返回结果列表的计算时间比sorted方法少。 The average compute times of the list method for both the .submit() and .map() codes were similar, at ~0.47sec. .submit（）和.map（）代码的list方法的平均计算时间相似，约为0.47秒。 The average compute time of the sorted method for the .submit() and .map() codes was 1.23sec and 1.01sec, respectively. .submit（）和.map（）代码的排序方法的平均计算时间分别为1.23秒和1.01秒。 In other words, the list method performed 2.62 times and 2.15 times faster than sorted method for the .submit() and .map() codes, respectively. 换句话说， list方法分别比.submit（）和.map（）代码的sorted方法快2.62倍和2.15倍。
It is not clear why the sorted method generated an ordered list from .map() faster than from .submit() , as the number of discretized tasks increased more than the number of pool workers, save when the number of discretized tasks equaled the number of pool workers. 目前尚不清楚为什么sorted方法比.submit() .map()更快地从.map()生成有序列表，因为离散化任务的数量增加超过池工作者数量，除非离散化任务的数量等于数量池工人。 That said, these findings shows that the decision to use the equally fast .submit() or .map() sub-classes can be encumbered by the sorted method. 也就是说，这些发现表明，使用同样快速的.submit()或.map()子类的决定可能会受到排序方法的阻碍。 For example, if the intent is to generate an ordered list in the shortest time possible, the use of ProcessPoolExecutor.map() should be preferred over ProcessPoolExecutor.submit() as .map() can allow the shortest total compute time. 例如，如果意图是在尽可能短的时间内生成有序列表，则ProcessPoolExecutor.map（）的使用应优先于ProcessPoolExecutor.submit()因为.map()可以允许最短的总计算时间。
The discretization scheme mentioned in Part 1 of my answer is shown here to speed-up the performance of both the .submit() and .map() sub-classes. 此处显示了我的答案第1部分中提到的离散化方案，以加快.submit()和.map()子类的性能。 The amount of speed-up can be as much as 20% over the case when the number of discretized tasks equaled the number of pool workers. 与离散任务的数量等于池工人数量的情况相比，加速量可高达20％。

Improved .map() code 改进了.map（）代码

#!/usr/bin/python3.5
# -*- coding: utf-8 -*-

import concurrent.futures as cf
from time import time
from itertools import repeat, chain 


def _findmatch(nmin, nmax, number):
    '''Function to find the occurence of number in range nmin to nmax and return
       the found occurences in a list.'''
    start = time()
    match=[]
    for n in range(nmin, nmax):
        if number in str(n):
            match.append(n)
    end = time() - start
    #print("\n def _findmatch {0:<10} {1:<10} {2:<3} found {3:8} in {4:.4f}sec".
    #      format(nmin, nmax, number, len(match),end))
    return match

def _concurrent(nmax, number, workers, num_of_chunks):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
       find the occurrences of a given number in a number range in a concurrent
       manner.'''
    # 1. Local variables
    start = time()
    chunksize = nmax // num_of_chunks
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        cstart = (chunksize * i for i in range(num_of_chunks))
        cstop = (chunksize * i if i != num_of_chunks else nmax
                 for i in range(1, num_of_chunks + 1))
        futures = executor.map(_findmatch, cstart, cstop, repeat(number))
    end = time() - start
    print('\n within statement of def _concurrent_map(nmax, number, workers, num_of_chunks):')
    print("found in {0:.4f}sec".format(end))
    return list(chain.from_iterable(futures)) #Return an unordered result list
    #return sorted(chain.from_iterable(futures)) #Return an ordered result list

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.
    workers = 6     # Pool of workers
    chunks_vs_workers = 30 # A factor of =>14 can provide optimum performance 
    num_of_chunks = chunks_vs_workers * workers

    start = time()
    found = _concurrent(nmax, number, workers, num_of_chunks)
    end = time() - start
    print('\n main')
    print('nmax={}, workers={}, num_of_chunks={}'.format(
          nmax, workers, num_of_chunks))
    #print('found = ', found)
    print("found {0} in {1:.4f}sec".format(len(found),end))

Improved .submit() code. 改进了.submit（）代码。
This code is same as .map code except you replace the _concurrent method with the following: 除了用以下代码替换_concurrent方法之外，此代码与.map代码相同：

def _concurrent(nmax, number, workers, num_of_chunks):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.submit to
       find the occurrences of a given number in a number range in a concurrent
       manner.'''
    # 1. Local variables
    start = time()
    chunksize = nmax // num_of_chunks
    futures = []
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        for i in range(num_of_chunks):
            cstart = chunksize * i
            cstop = chunksize * (i + 1) if i != num_of_chunks - 1 else nmax
            futures.append(executor.submit(_findmatch, cstart, cstop, number))
    end = time() - start
    print('\n within statement of def _concurrent_submit(nmax, number, workers, num_of_chunks):')
    print("found in {0:.4f}sec".format(end))
    return list(chain.from_iterable(f.result() for f in cf.as_completed(
        futures))) #Return an unordered list
    #return list(chain.from_iterable(f.result() for f in cf.as_completed(
    #    futures))) #Return an ordered list

======================================================================= ================================================== =====================

Answer 2

You're comparing apples to oranges here. 你在这里比较苹果和橘子。 When using map you produce all the 1E8 numbers and transfer them to worker processes. 使用map您可以生成所有1E8数字并将它们传输到工作进程。 This takes a lot of time compared to actual execution. 与实际执行相比，这需要花费大量时间。 When using submit you just create 6 sets of parameters that get transferred. 使用submit您只需创建6组已传输的参数。

If you change map to operate with the same principle you'll get numbers that are close to each other: 如果您更改map以使用相同的原则操作，您将获得彼此接近的数字：

def _findmatch(nmin, nmax, number):
    '''Function to find the occurrence of number in range nmin to nmax and return
       the found occurrences in a list.'''
    print('\n def _findmatch', nmin, nmax, number)
    start = time()
    match=[]
    for n in range(nmin, nmax):
        if number in str(n):
            match.append(n)
    end = time() - start
    print("found {0} in {1:.4f}sec".format(len(match),end))
    return match

def _concurrent_map(nmax, number, workers):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
       find the occurrences of a given number in a number range in a parallelised
       manner.'''
    # 1. Local variables
    start = time()
    chunk = nmax // workers
    futures = []
    found =[]
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        cstart = (chunk * i for i in range(workers))
        cstop = (chunk * i if i != workers else nmax for i in range(1, workers + 1))
        futures = executor.map(_findmatch, cstart, cstop, itertools.repeat(number))

        # 2.3. Consolidate result as a list and return this list.
        for future in futures:
            for f in future:
                try:
                    found.append(f)
                except:
                    print_exc()
        foundsize = len(found)
        end = time() - start
        print('within statement of def _concurrent(nmax, number):')
        print("found {0} in {1:.4f}sec".format(foundsize, end))
    return found

You could improve the performance of submit by using as_completed correctly. 您可以正确使用as_completed来提高提交的性能。 For given iterable of futures it will return an iterator that will yield futures in the order they complete. 对于给定的期货可迭代，它将返回一个迭代器，它将按照它们完成的顺序yield期货。

You could also skip the copying of the data to another array and use itertools.chain.from_iterable to combine the results from futures to single iterable: 您还可以跳过将数据复制到另一个数组并使用itertools.chain.from_iterable将结果从期货合并到单个可迭代：

import concurrent.futures as cf
import itertools
from time import time
from traceback import print_exc
from itertools import chain

def _findmatch(nmin, nmax, number):
    '''Function to find the occurrence of number in range nmin to nmax and return
       the found occurrences in a list.'''
    print('\n def _findmatch', nmin, nmax, number)
    start = time()
    match=[]
    for n in range(nmin, nmax):
        if number in str(n):
            match.append(n)
    end = time() - start
    print("found {0} in {1:.4f}sec".format(len(match),end))
    return match

def _concurrent_map(nmax, number, workers):
    '''Function that utilises concurrent.futures.ProcessPoolExecutor.map to
       find the occurrences of a given number in a number range in a parallelised
       manner.'''
    # 1. Local variables
    chunk = nmax // workers
    futures = []
    found =[]
    #2. Parallelization
    with cf.ProcessPoolExecutor(max_workers=workers) as executor:
        # 2.1. Discretise workload and submit to worker pool
        for i in range(workers):
            cstart = chunk * i
            cstop = chunk * (i + 1) if i != workers - 1 else nmax
            futures.append(executor.submit(_findmatch, cstart, cstop, number))

    return chain.from_iterable(f.result() for f in cf.as_completed(futures))

if __name__ == '__main__':
    nmax = int(1E8) # Number range maximum.
    number = str(5) # Number to be found in number range.
    workers = 6     # Pool of workers

    start = time()
    a = _concurrent_map(nmax, number, workers)
    end = time() - start
    print('\n main')
    print('workers = ', workers)
    print("found {0} in {1:.4f}sec".format(sum(1 for x in a),end))

python concurrent.futures.ProcessPoolExecutor：.submit（）vs .map（）的性能

问题描述

2 个解决方案

解决方案1
7 已采纳 2017-02-07 18:11:13

解决方案2
2 2017-02-07 04:41:56

python concurrent.futures.ProcessPoolExecutor：.submit（）vs .map（）的性能

问题描述

2 个解决方案

解决方案1 7 已采纳 2017-02-07 18:11:13

解决方案2 2 2017-02-07 04:41:56

解决方案1
7 已采纳 2017-02-07 18:11:13

解决方案2
2 2017-02-07 04:41:56