簡體   English   中英

Python 中多處理措施的意外結果

[英]Unexpected results of measures with multiprocessing in Python

我正在嘗試使用並發遺傳算法與線性遺傳算法來加速 Rosenbrock 函數的一些計算。 我開始嘗試使用線程和多處理 Python 庫,並找到了一種方法,但是(總是有一個“但是”)我在評估中發現了完全出乎意料的行為。

我測量了范圍 [5 - 500000] 中人口的 2D Rosenbrock(或任何更大維度)的計算,每個人口 10 次測試。 怎么了?

1 過程比 iter 算法快得多,甚至在計算上花費的時間減少了 50%,這似乎是完全錯誤的。

你知道為什么我有很多收獲嗎? 一個進程應該在與迭代算法相似的時間內進行計算(甚至更糟,因為運行進程需要資源,對吧?)

您可以在鏈接上看到完整的結果('n' 表示 Rosenbrock 的維度)

#!/usr/bin/python
import scipy
import multiprocessing
from timeit import default_timer as timer
import math

def rosenbrock(x_1, x_2):
    return 100*(x_2-x_1**2)**2 + (1-x_1)**2

def n_rosenbrock(X):
    sum_r = 0
    for i in range(len(X)-1):
        sum_r += rosenbrock(X[i], X[i+1])
    return sum_r

def evaluation(shared_population, shared_fitnesses, nr_of_genes, x_1, x_2):
    for i in range(x_1, x_2, nr_of_genes):
        result = n_rosenbrock(shared_fitnesses[i:i+nr_of_genes])
        shared_fitnesses[int(i/nr_of_genes)] = result

if __name__ == '__main__':
    min_x = -5
    max_x = 5
    cores = 1

    POP_SIZES = [5, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 25000, 50000, 100000, 150000, 200000, 250000, 300000, 350000, 400000, 450000, 500000]
    iters_time = []
    proc_eval_time = []


    for idp, pop_size in enumerate(POP_SIZES):
        for nr_of_genes in range(2, 3):
            population = scipy.random.uniform(min_x, max_x, (pop_size * nr_of_genes))
            shared_population = multiprocessing.Array('f', scipy.reshape(population, pop_size*nr_of_genes), lock=False)
            shared_fitnesses = multiprocessing.Array('f', pop_size, lock=False)

            indexes = [int(pop_size/cores)] * cores
            for x in range(int(pop_size%cores)):
                indexes[x] += 1
            test_c = 10
            process_eval_time = 0
            process_sel_time = 0

            iter_time = 0

            print("Iter", idp)
            iter_population = scipy.reshape(population, pop_size*nr_of_genes)
            iter_fitnesses = scipy.zeros(pop_size)
            for _ in range(test_c):
                iter_timer_start = timer()
                for i in range(0,len(iter_population),nr_of_genes):
                    result = n_rosenbrock(iter_population[i:i+nr_of_genes])
                    iter_fitnesses[int(i/nr_of_genes)] = result
                iter_timer_stop = timer()
                iter_time += (iter_timer_stop-iter_timer_start)
            iters_time.append(iter_time/test_c)

            print("Process", idp)
            for _ in range(test_c):
                processes = scipy.empty(cores, dtype=multiprocessing.Process)
                for idx in range(cores):
                    x_1 = sum(indexes[:idx]) * nr_of_genes
                    x_2 = x_1 + indexes[idx] * nr_of_genes
                    args = (shared_population, shared_fitnesses, nr_of_genes, x_1, x_2)
                    process = multiprocessing.Process(target=evaluation, args=args)
                    processes[idx] = process
                process_eval_start = timer()
                for p in processes:
                    p.start()
                for p in processes:
                    p.join()
                process_eval_stop = timer()
                process_eval_time += (process_eval_stop-process_eval_start)
            proc_eval_time.append(process_eval_time/test_c)

    print("iters_time", iters_time)
    print("process_eval_time", proc_eval_time)

看來您的比較可能無效。 我建議像這樣組織代碼:

def do_iter(x, y, z):
    ...

def do_multiproc(x, y, z):
    ...

for x in population_sizes:
    timeit.timeit('do_iter(x, y, z)')
    timeit.timeit('do_multiproc(x, y, z)')

這段代碼顯然不會運行。 關鍵是每個方法涉及的所有設置和處理都應該完全封裝在該方法的do_x函數中。 do_x函數應該采用相同的參數,否則盡可能相似。

此外,看起來您正在測試 args 的每個組合 10 次,這可能不足以獲得准確的計時。 timeit.timeit()默認為 1,000,000 次迭代。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM