python - 我希望多个线程产生多个进程，所有进程都应该并行运行

Question

I have a function called run_3_processes , which spawns 3 processes (duh) using multiprocessing.pool.apply , waits for their results and processes these results and returns a single result.我有一个名为run_3_processes的函数，它使用multiprocessing.pool.apply产生 3 个进程（duh），等待它们的结果并处理这些结果并返回一个结果。

I have another function called run_3_processes_3_times , which should run run_3_processes 3 times in parallel , wait for all of them to return and then process all their results.我有另一个名为run_3_processes_3_times函数，它应该并行运行run_3_processes 3 次，等待它们全部返回，然后处理它们的所有结果。

things I tried:我尝试过的事情：

using a process pool for run_3_processes_3_times - turns out this is complicated because of Python Process Pool non-daemonic?为run_3_processes_3_times使用进程池 - 原来这很复杂，因为Python 进程池是非守护进程？
rewriting the entire applicative code to spawn 9 processes with the same pool - this really convoluted my code and breaks encapsulation重写整个应用代码以使用相同的池生成 9 个进程 - 这确实使我的代码变得复杂并破坏了封装
using a threadpool.apply for run_3_processes_3_times - for some reason this makes it runs serially, not in parallel - is it because the apply in run_3_processes blocks the GIL?使用threadpool.apply为run_3_processes_3_times -由于某种原因，这使得连续运行，而不是在并行-是不是因为apply在run_3_processes块GIL？

I'm sure there's a one-liner solution I'm missing... thanks!我确定我缺少一种单行解决方案……谢谢！

Answer 1

Since you're using a combination of true threads and sub-processes, you will "sort of" run into the GIL, but the way it's structured makes it seem unlikely that it will be a problem.由于您使用的是真正的线程和子进程的组合，您将“有点”遇到 GIL，但它的结构方式使它看起来不太可能成为问题。 The ThreadPool will be subject to context switches to give concurrency between the threads, but since it's only purpose is to spawn the child processes, it's not doing anything CPU intensive. ThreadPool将接受上下文切换以提供线程之间的并发性，但由于它的唯一目的是生成子进程，因此它不会执行任何 CPU 密集型操作。 I'm not sure why it's even necessary to use multiple threads;我不确定为什么甚至需要使用多个线程； I would probably just have a single threaded parent process spawn and wait on the child processes directly.我可能只会生成一个单线程父进程并直接等待子进程。

In both functions, it's probably more idiomatic to use the map() method instead of apply_async() , although both will work.在这两个函数中，使用map()方法而不是apply_async()可能更惯用，尽管两者都可以工作。 Usually that would look a bit like this:通常看起来有点像这样：

process_count = 3

def pre_process(input_data):
    input_subsets = [[]] * process_count
    for idx, data_point in enumerate(input_data):
        <do any input validation on data_point>
        input_subsets[idx % process_count].append(data_point)
    return input_subsets

def process_data(input_data):
    return_val = []
    for val in input_data:
        <do some processing work>
        return_val.append(<result of processing>) 
    return return_val

data_subsets = pre_process(raw_data)
pool = mp.Pool(process_count)
result_list = pool.map(process_data, data_subsets)
<check result_list>

Answer 2

ok, found a hacky answer, would love to hear if there's something better:好的，找到了一个骇人听闻的答案，很想听听是否有更好的方法：


def run_3_processes_3_times():
        pool = ThreadPool(3)
        candidates = [pool.apply_async(run_3_processes,
                                 args=(c)) for c in range(3)]
        candidates = [x.get() for x in candidates]
        pool.close()

def run_3_processes(c):
        pool = mp.Pool(3)
        solutions = [pool.apply_async(do_it,
                                      args=(i) for i in range(3)]
        solutions = [x.get() for x in solutions]
        pool.close()

python - 我希望多个线程产生多个进程，所有进程都应该并行运行

问题描述

2 个解决方案

解决方案1
1 2020-03-23 13:54:59

解决方案2
0 2020-03-22 15:00:44

python - 我希望多个线程产生多个进程，所有进程都应该并行运行

问题描述

2 个解决方案

解决方案1 1 2020-03-23 13:54:59

解决方案2 0 2020-03-22 15:00:44

解决方案1
1 2020-03-23 13:54:59

解决方案2
0 2020-03-22 15:00:44