Python 為什么 For 循環的性能始終比使用多處理更快？

Question

我正在嘗試學習 Python3.9 中的multiprocessing庫。 我比較的一件事是在由每個數據集220500樣本組成的數據集上重復計算的性能。 我使用multiprocessing庫然后使用for循環來做到這一點。

在我的整個測試過程中，我一直使用 for 循環獲得更好的性能。 這是我正在運行的測試的代碼。 我正在計算具有 220500 個樣本的信號的 FFT。 我的實驗涉及在每個測試中運行此過程一定次數。 我正在通過將進程數分別設置為 10、100 和 1000 來對此進行測試。

 import time import numpy as np from scipy.signal import get_window from scipy.fftpack import fft import multiprocessing from itertools import product def make_signal(): # moved this code into a function to make threading portion of code clearer DUR = 5 FREQ_HZ = 10 Fs = 44100 # precompute the size N = DUR * Fs # get a windowing function w = get_window('hanning', N) t = np.linspace(0, DUR, N) x = np.zeros_like(t) b = 2*np.pi*FREQ_HZ*t for i in range(50): x += np.sin(b*i) return x*w, Fs def fft_(x, Fs): yfft = fft(x)[:x.size//2] xfft = np.linspace(0,Fs//2,yfft.size) return 2/yfft.size * np.abs(yfft), xfft if __name__ == "__main__": # grab the raw sample data which will be computed by the fft function x = make_signal() # len(x) = 220500 # create 5 different tests, each with the amount of processes below # array([ 10, 100, 1000]) tests_sweep = np.logspace(1,3,3, dtype=int) # sweep through the processes for iteration, test_num in enumerate(tests_sweep): # create a list of the amount of processes to give for each iteration fft_processes = [] for i in range(test_num): fft_processes.append(x) start = time.time() # repeat the process for test_num amount of times (eg 10, 100, 1000) with multiprocessing.Pool() as pool: results = pool.starmap(fft_, fft_processes) end = time.time() print(f'{iteration}: Multiprocessing method with {test_num} processes took: {end - start:.2f} sec') start = time.time() for fft_processes in fft_processes: # repeat the process the same amount of time as the multiprocessing method using for loops fft_(*fft_processes) end = time.time() print(f'{iteration}: For-loop method with {test_num} processes took: {end - start:.2f} sec') print('----------')

這是我的測試結果。

 0: Multiprocessing method with 10 processes took: 0.84 sec 0: For-loop method with 10 processes took: 0.05 sec ---------- 1: Multiprocessing method with 100 processes took: 1.46 sec 1: For-loop method with 100 processes took: 0.45 sec ---------- 2: Multiprocessing method with 1000 processes took: 6.70 sec 2: For-loop method with 1000 processes took: 4.21 sec ----------

為什么 for 循環方法要快得多？ 我是否正確使用了multiprocessing庫？ 謝謝。

Answer 1

啟動一個新進程需要大量的開銷。 此外，必須將數據從一個進程復制到另一個進程（與普通內存復制相比，這也有一些開銷）。

另一方面是您應該將進程數量限制為您擁有的核心數量。 重復也會使您產生流程轉換成本。

這一點，再加上每個進程的計算量很少，這使得切換不值得。

我認為，如果您將信號顯着延長（10 倍或 100 倍），您應該會開始看到使用多核的一些好處。

還要檢查您正在運行的操作是否已經在使用某種並行性。 它們可能是用線程實現的，這大大降低了進程的成本（但歷史上在 python 中效果不佳，染成 GIL）。

Python 為什么 For 循環的性能始終比使用多處理更快？

問題描述

1 個解決方案

解決方案1
0 2020-11-22 08:25:57

Python 為什么 For 循環的性能始終比使用多處理更快？

問題描述

1 個解決方案

解決方案1 0 2020-11-22 08:25:57

解決方案1
0 2020-11-22 08:25:57