在Python多處理中處理多個結果

Question

我正在編寫Python代碼，以使用多重處理功能來解析許多ascii文件。 對於每個文件，我必須執行此功能的操作

def parse_file(file_name):
    record = False
    path_include = []
    buffer_include = []
    include_file_filters = {}
    include_keylines = {}
    grids_lines = []
    mat_name_lines = []
    pids_name_lines = []
    pids_shell_lines= []
    pids_weld_lines = []
    shells_lines = []
    welds_lines = []
    with open(file_name, 'rb') as in_file:
        for lineID, line in enumerate(in_file):
            if record:
                path_include += line
            if record and re.search(r'[\'|\"]$', line.strip()):
                buffer_include.append(re_path_include.search(
                    path_include).group(1).replace('\n', ''))
                record = False
            if 'INCLUDE' in line and '$' not in line:
                if re_path_include.search(line):
                    buffer_include.append(
                        re_path_include.search(line).group(1))
                else:
                    path_include = line
                    record = True
            if line.startswith('GRID'):
                grids_lines += [lineID]
            if line.startswith('$HMNAME MAT'):
                mat_name_lines += [lineID]
            if line.startswith('$HMNAME PROP'):
                pids_name_lines += [lineID]
            if line.startswith('PSHELL'):
                pids_shell_lines += [lineID]
            if line.startswith('PWELD'):
                pids_weld_lines += [lineID]
            if line.startswith(('CTRIA3', 'CQUAD4')):
                shells_lines += [lineID]
            if line.startswith('CWELD'):
                welds_lines += [lineID]
    include_keylines = {'grid': grids_lines, 'mat_name': mat_name_lines, 'pid_name': pids_name_lines, \
                        'pid_shell': pids_shell_lines, 'pid_weld': pids_weld_lines, 'shell': shells_lines, 'weld': welds_lines}
    include_file_filters = {file_name: include_keylines}
    return buffer_include, include_file_filters

此功能以這種方式用於循環瀏覽文件列表（CPU上的每個進程都解析一個完整的文件）

import multiprocessing as mp
p = mp.Pool(mp.cpu_count())
buffer_include = []
include_file_filters = {}
for include in grouper([list_of_file_path]):
    current = mp.current_process()
    print 'Running: ', current.name, current._identity
    results = p.map(parse_file, include) 
    buffer_include += results[0]
    include_file_filters.update(results[1])
p.close()

上面使用的grouper功能定義為

def grouper(iterable, padvalue=None):
    return itertools.izip_longest(*[iter(iterable)]*mp.cpu_count(), fillvalue=padvalue)

我在具有4個核心（Intel Core i3-6006U）的cpu中使用Python 2.7.15。

運行代碼時，我看到所有CPU都Running: MainProcess () 100％的Running: MainProcess () ，Python控制台中的輸出為Running: MainProcess ()但除此之外沒有任何變化。 看來我的代碼在指令results = p.map(parse_file, include)處被阻塞並且無法繼續執行（當我一次不並行地解析一個文件時，代碼運行良好）。

怎么了？
在並行執行期間，如何處理parse_file函數給出的結果？我的方法正確與否？

預先感謝您的支持

編輯

感謝darc的回復。 我已經嘗試過您的建議，但是問題是相同的。 如果我將代碼放在if語句下，該問題似乎可以克服

if __name__ == '__main__':

也許這是由於Python IDLE處理過程的方式所致。 我出於開發和調試原因而使用IDLE環境。

Answer 1

根據python docs ：

map（func，iterable [，chunksize]）與map（）內置函數的並行等效項（盡管它僅支持一個可迭代的參數）。 它會阻塞直到結果准備就緒。

此方法將可迭代項分為多個塊，將其作為單獨的任務提交給流程池。 這些塊的（大約）大小可以通過將chunksize設置為正整數來指定。

因為它阻止了您的進程，所以請等待解析文件完成。

由於地圖已經限制了可迭代項，您可以嘗試將所有包含項作為一個大可迭代項一起發送。

import multiprocessing as mp
p = mp.Pool(mp.cpu_count())
buffer_include = []
include_file_filters = {}
results = p.map(parse_file, list_of_file_path, 1) 
buffer_include += results[0]
include_file_filters.update(results[1])
p.close()

如果要保留原始循環，請使用apply_async，或者如果要使用python3，則可以使用ProcessPoolExecutor Submit（）函數並讀取結果。

在Python多處理中處理多個結果

問題描述

1 個解決方案

解決方案1
0 2018-08-14 13:55:20

在Python多處理中處理多個結果

問題描述

1 個解決方案

解決方案1 0 2018-08-14 13:55:20

解決方案1
0 2018-08-14 13:55:20