為什么在此python代碼中，單個進程池比序列化實現快？

Question

我在python中遇到多重處理。 我知道它比序列化計算要慢，這不是我的帖子要講的。

我只是在徘徊，為什么一個進程池比我的基本問題的序列化計算要快。 這些時間不應該一樣嗎？

這是代碼：

import time
import multiprocessing as mp
import matplotlib.pyplot as plt


def func(x):
    return x*x*x


def multi_proc(nb_procs):
    tic = time.time()
    pool = mp.Pool(processes=nb_procs)
    pool.map_async(func, range(1, 10000000))
    toc = time.time()
    return toc-tic


def single_core():
    tic = time.time()
    [func(x) for x in range(1, 10000000)]
    toc = time.time()
    return toc-tic

if __name__ == '__main__':
    sc_times = [0]
    mc_times = [0]
    print('single core computation')
    sc_constant_time = single_core()
    print('{} secs'.format(sc_constant_time))
    for nb_procs in range(1, 12):
        print('computing for {} processes...'.format(nb_procs))
        time_elapsed = (multi_proc(nb_procs))
        print('{} secs'.format(time_elapsed))
        mc_times.append(time_elapsed)
    sc_times = [sc_constant_time for _ in mc_times]
    plt.plot(sc_times, 'r--')
    plt.plot(mc_times, 'b--')
    plt.xlabel('nb procs')
    plt.ylabel('time (s)')
    plt.show()

以及每進程數的時間圖（紅色=串行計算，藍色=多處理）：

編輯1：如Sidhnarth Gupta所示，我修改了代碼，這是我擁有的新代碼。 我無故更改了功能。

import time
import multiprocessing as mp
import matplotlib.pyplot as plt
import random


def func(x):
    return random.choice(['a', 'b', 'c', 'd', 'e', 'f', 'g'])


def multi_proc(nb_procs, nb_iter):
    tic = time.time()
    pool = mp.Pool(processes=nb_procs)
    pool.map_async(func, range(1, nb_iter)).get()
    toc = time.time()
    return toc-tic


def single_core(nb_iter):
    tic = time.time()
    [func(x) for x in range(1, nb_iter)]
    toc = time.time()
    return toc-tic

if __name__ == '__main__':
    # configure
    nb_iter = 100000
    max_procs = 16
    sc_times = [0]
    mc_times = [0]

    # multi proc calls
    for nb_procs in range(1, max_procs):
        print('computing for {} processes...'.format(nb_procs))
        time_elapsed = (multi_proc(nb_procs, nb_iter))
        print('{} secs'.format(time_elapsed))
        mc_times.append(time_elapsed)

    # single proc call
    print('single core computation')
    for nb in range(1, len(mc_times)):
        print('{}...'.format(nb))
        sc_times.append(single_core(nb_iter))
    # average time
    average_time = sum(sc_times)/len(sc_times)
    print('average time on single core: {} secs'.format(average_time))

    # plot
    plt.plot(sc_times, 'r--')
    plt.plot(mc_times, 'b--')
    plt.xlabel('nb procs')
    plt.ylabel('time (s)')
    plt.show()

這是我的新情節：

我想我現在可以說我通過使用多處理提高了程序的速度。

Answer 1

您當前用於計算多處理時間的代碼實際上是在告知將任務提交到池所需的時間。 處理實際上是在異步模式下進行的，沒有阻塞線程。

我通過以下更改嘗試了您的程序：

def multi_proc(nb_procs):
    tic = time.time()
    pool = mp.Pool(processes=nb_procs)
    pool.map_async(func, range(1, 10000000)).get()
    toc = time.time()
    return toc-tic

和

def multi_proc(nb_procs):
    tic = time.time()
    pool = mp.Pool(processes=nb_procs)
    pool.map(func, range(1, 10000000))
    toc = time.time()
    return toc-tic

與序列化計算相比，它們花費的時間明顯更多。

同樣，在創建此類圖時，您還應該考慮在每次要映射值時都調用single_core（）函數，而不是多次映射同一值。 您將看到同一時間所花費的時間差異很大。

為什么在此python代碼中，單個進程池比序列化實現快？

問題描述

1 個解決方案

解決方案1
2 已采納 2016-07-22 04:42:13

為什么在此python代碼中，單個進程池比序列化實現快？

問題描述

1 個解決方案

解決方案1 2 已采納 2016-07-22 04:42:13

解決方案1
2 已采納 2016-07-22 04:42:13