简体   繁体   English

在Python多处理中处理多个结果

[英]Handle multiple results in Python multiprocessing

I'm writing a Python piece of code to parse a lot of ascii file using multiprocessing functionality. 我正在编写Python代码,以使用多重处理功能来解析许多ascii文件。 For each file I've to perform the operations of this function 对于每个文件,我必须执行此功能的操作

def parse_file(file_name):
    record = False
    path_include = []
    buffer_include = []
    include_file_filters = {}
    include_keylines = {}
    grids_lines = []
    mat_name_lines = []
    pids_name_lines = []
    pids_shell_lines= []
    pids_weld_lines = []
    shells_lines = []
    welds_lines = []
    with open(file_name, 'rb') as in_file:
        for lineID, line in enumerate(in_file):
            if record:
                path_include += line
            if record and re.search(r'[\'|\"]$', line.strip()):
                buffer_include.append(re_path_include.search(
                    path_include).group(1).replace('\n', ''))
                record = False
            if 'INCLUDE' in line and '$' not in line:
                if re_path_include.search(line):
                    buffer_include.append(
                        re_path_include.search(line).group(1))
                else:
                    path_include = line
                    record = True
            if line.startswith('GRID'):
                grids_lines += [lineID]
            if line.startswith('$HMNAME MAT'):
                mat_name_lines += [lineID]
            if line.startswith('$HMNAME PROP'):
                pids_name_lines += [lineID]
            if line.startswith('PSHELL'):
                pids_shell_lines += [lineID]
            if line.startswith('PWELD'):
                pids_weld_lines += [lineID]
            if line.startswith(('CTRIA3', 'CQUAD4')):
                shells_lines += [lineID]
            if line.startswith('CWELD'):
                welds_lines += [lineID]
    include_keylines = {'grid': grids_lines, 'mat_name': mat_name_lines, 'pid_name': pids_name_lines, \
                        'pid_shell': pids_shell_lines, 'pid_weld': pids_weld_lines, 'shell': shells_lines, 'weld': welds_lines}
    include_file_filters = {file_name: include_keylines}
    return buffer_include, include_file_filters 

This function is used in a loop through list of files, in this way (each process on CPU parse one entire file) 此功能以这种方式用于循环浏览文件列表(CPU上的每个进程都解析一个完整的文件)

import multiprocessing as mp
p = mp.Pool(mp.cpu_count())
buffer_include = []
include_file_filters = {}
for include in grouper([list_of_file_path]):
    current = mp.current_process()
    print 'Running: ', current.name, current._identity
    results = p.map(parse_file, include) 
    buffer_include += results[0]
    include_file_filters.update(results[1])
p.close()

The grouper function used above is defined as 上面使用的grouper功能定义为

def grouper(iterable, padvalue=None):
    return itertools.izip_longest(*[iter(iterable)]*mp.cpu_count(), fillvalue=padvalue)

I'm using Python 2.7.15 in cpu with 4 cores (Intel Core i3-6006U). 我在具有4个核心(Intel Core i3-6006U)的cpu中使用Python 2.7.15。

When I run my code, I see all the CPUs engaged on 100%, the output in Python console as Running: MainProcess () but nothing appened otherwise. 运行代码时,我看到所有CPU都Running: MainProcess () 100%的Running: MainProcess () ,Python控制台中的输出为Running: MainProcess ()但除此之外没有任何变化。 It seems that my code is blocked at instruction results = p.map(parse_file, include) and can't go ahead (the code works well when i parse the files one at a time without parallelization). 看来我的代码在指令results = p.map(parse_file, include)处被阻塞并且无法继续执行(当我一次不并行地解析一个文件时,代码运行良好)。

  • What is wrong? 怎么了?
  • How can I deal with the results given by parse_file function during parallel execution?My approach is correct or not? 在并行执行期间,如何处理parse_file函数给出的结果?我的方法正确与否?

Thanks in advance for your support 预先感谢您的支持

EDIT 编辑

Thanks darc for your reply. 感谢darc的回复。 I've tried your suggestion but the issue is the same. 我已经尝试过您的建议,但是问题是相同的。 The problem, seems to be overcome if I put the code under if statement like so 如果我将代码放在if语句下,该问题似乎可以克服

if __name__ == '__main__':

Maybe this is due to the manner in which Python IDLE handle the process. 也许这是由于Python IDLE处理过程的方式所致。 I'm using the IDLE environ for development and debugging reasons. 我出于开发和调试原因而使用IDLE环境。

according to python docs : 根据python docs

map(func, iterable[, chunksize]) A parallel equivalent of the map() built-in function (it supports only one iterable argument though). map(func,iterable [,chunksize])与map()内置函数的并行等效项(尽管它仅支持一个可迭代的参数)。 It blocks until the result is ready. 它会阻塞直到结果准备就绪。

This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. 此方法将可迭代项分为多个块,将其作为单独的任务提交给流程池。 The (approximate) size of these chunks can be specified by setting chunksize to a positive integer. 这些块的(大约)大小可以通过将chunksize设置为正整数来指定。

since it is blocking your process wait until parse file is done. 因为它阻止了您的进程,所以请等待解析文件完成。

since map already chnucks the iterable you can try to send all of the includes together as one large iterable. 由于地图已经限制了可迭代项,您可以尝试将所有包含项作为一个大可迭代项一起发送。

import multiprocessing as mp
p = mp.Pool(mp.cpu_count())
buffer_include = []
include_file_filters = {}
results = p.map(parse_file, list_of_file_path, 1) 
buffer_include += results[0]
include_file_filters.update(results[1])
p.close()

if you want to keep the original loop use apply_async, or if you are using python3 you can use ProcessPoolExecutor submit() function and read the results. 如果要保留原始循环,请使用apply_async,或者如果要使用python3,则可以使用ProcessPoolExecutor Submit()函数并读取结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM