简体   繁体   English

为什么多处理比顺序处理花费更多时间?

[英]Why Multiprocessing is taking more time than sequential processing?

The below code is taking around 15 seconds to get the result.下面的代码大约需要 15 秒才能得到结果。 But when I run a it sequentially it only takes around 11 seconds.但是当我按顺序运行它时,只需要大约 11 秒。 What can be the reason for this?这可能是什么原因?

import multiprocessing
import os
import time
def square(x):
    # print(os.getpid())
    return x*x

if __name__=='__main__':

    start_time = time.time()
    p = multiprocessing.Pool()
    r = range(100000000)
    p1 = p.map(square,r)
    end_time = time.time()
    print('time_taken::',end_time-start_time)
    

Sequential code顺序码

start_time = time.time()
d = list(map(square,range(100000000)))
end_time = time.time()

Regarding your code example, there are two important factors which influence runtime performance gains achievable by parallelization:关于您的代码示例,有两个重要因素会影响并行化可实现的运行时性能增益:

First, you have to take the administrative overhead into account.首先,您必须考虑管理开销。 This means, that spawning new processes is rather expensive in comparison to simple arithmetic operations.这意味着,与简单的算术运算相比,产生新进程相当昂贵。 Therefore, you gain performance, when the computation's complexity exceeds a certain threshold.因此,当计算的复杂度超过某个阈值时,您可以获得性能。 Which was not the case in your example above.在您上面的示例中,情况并非如此。

Secondly, you have to think of a "clever way" of splitting your computation into parts which could be independently executed.其次,你必须想出一种“聪明的方法”,将你的计算分成可以独立执行的部分。 In the given code example, you can optimize the chunks you pass to the worker processes created by multiprocessing.Pool , so that each process has a self contained package of computations to perform.在给定的代码示例中,您可以优化传递给由multiprocessing.Pool创建的工作进程的块,以便每个进程都有一个自包含的 package 计算来执行。

Eg, this could be accomplished with the following modifications of your code:例如,这可以通过对您的代码进行以下修改来完成:

def square(x):
    return x ** 2


def square_chunk(i, j):
    return list(map(square, range(i, j)))


def calculate_in_parallel(n, c=4):
    """Calculates a list of squares in a parallelized manner"""
    result = []
    step = math.ceil(n / c)

    with Pool(c) as p:
        partial_results = p.starmap(
            square_chunk, [(i, min(i + step, n)) for i in range(0, n, step)]
        )

        for res in partial_results:
            result += res

    return result

Please note, that I used the operation x**2 (instead of the heavily optimized x*x ) to increase the load and underline resulting runtime differences.请注意,我使用了操作x**2 (而不是经过高度优化的x*x )来增加负载并强调由此产生的运行时差异。

Here, the Pool 's starmap() -function is used which unpacks arguments of the passed tuples.在这里,使用了Poolstarmap() -函数来解包传递的元组的 arguments。 Using it, we can effectively pass more than one argument to the mapped function.使用它,我们可以有效地将多个参数传递给映射的 function。 Furthermore, we distribute the workload evenly to the amount of available cores.此外,我们将工作负载平均分配给可用内核的数量。 On each core the range of numbers between (i, min(i + step, n)) is calculated, whereas the step denotes the chunksize, calculated as the maximum_number divided by the count of CPU.在每个核心上,计算(i, min(i + step, n))之间的数字范围,而step表示块大小,计算为 maximum_number 除以 CPU 计数。

By running the code with different parametrizations, one can clearly see, that the performance gain increases when the maximum number (denoted n ) increases.通过运行具有不同参数化的代码,可以清楚地看到,当最大数量(表示为n )增加时,性能增益会增加。 As expected, when more cores are used in parallel the runtime is reduced as well.正如预期的那样,当并行使用更多内核时,运行时间也会减少。

运行时性能比较

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么给定代码中的多处理代码比通常的顺序执行要花更多时间? - Why is multiprocessed code in given code taking more time than usual sequential execution? 多处理并行处理比顺序处理慢 - Parallel processing with multiprocessing is slower than sequential 为什么这个Python并行循环比顺序循环要花更长的时间? - Why this Python parallel loop is taking longer time than sequential loop? 为什么多处理池在增加进程时需要更多时间? - why multiprocessing pool taking more time when increasing the process? 多处理比单(正常)处理花费更长的时间 - Multiprocessing taking longer than single (normal) processing 为什么 dask 的“to_sql”比 pandas 花费更多的时间? - Why dask's "to_sql" is taking more time than pandas? Python多处理比单个处理花费更长的时间 - Python multiprocessing is taking much longer than single processing 通过多线程和多处理并行处理比串行处理花费更多的时间 - Parallelizing through Multi-threading and Multi-processing taking significantly more time than serial 多处理,为什么要顺序执行 - with multiprocessing, why sequential execution 为什么python中的文件处理要花更多时间处理文件中稍后出现的块? - Why is file processing in python taking more time for chunks which come later in the file?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM