[英]python - I want multiple threads to spawn multiple processes, all should run in parallel
I have a function called run_3_processes
, which spawns 3 processes (duh) using multiprocessing.pool.apply
, waits for their results and processes these results and returns a single result.我有一个名为
run_3_processes
的函数,它使用multiprocessing.pool.apply
产生 3 个进程(duh),等待它们的结果并处理这些结果并返回一个结果。
I have another function called run_3_processes_3_times
, which should run run_3_processes
3 times in parallel , wait for all of them to return and then process all their results.我有另一个名为
run_3_processes_3_times
函数,它应该并行运行run_3_processes
3 次,等待它们全部返回,然后处理它们的所有结果。
things I tried:我尝试过的事情:
run_3_processes_3_times
- turns out this is complicated because of Python Process Pool non-daemonic?run_3_processes_3_times
使用进程池 - 原来这很复杂,因为Python 进程池是非守护进程?threadpool.apply
for run_3_processes_3_times
- for some reason this makes it runs serially, not in parallel - is it because the apply
in run_3_processes
blocks the GIL?threadpool.apply
为run_3_processes_3_times
-由于某种原因,这使得连续运行,而不是在并行-是不是因为apply
在run_3_processes
块GIL? I'm sure there's a one-liner solution I'm missing... thanks!我确定我缺少一种单行解决方案……谢谢!
Since you're using a combination of true threads and sub-processes, you will "sort of" run into the GIL, but the way it's structured makes it seem unlikely that it will be a problem.由于您使用的是真正的线程和子进程的组合,您将“有点”遇到 GIL,但它的结构方式使它看起来不太可能成为问题。 The
ThreadPool
will be subject to context switches to give concurrency between the threads, but since it's only purpose is to spawn the child processes, it's not doing anything CPU intensive. ThreadPool
将接受上下文切换以提供线程之间的并发性,但由于它的唯一目的是生成子进程,因此它不会执行任何 CPU 密集型操作。 I'm not sure why it's even necessary to use multiple threads;我不确定为什么甚至需要使用多个线程; I would probably just have a single threaded parent process spawn and wait on the child processes directly.
我可能只会生成一个单线程父进程并直接等待子进程。
In both functions, it's probably more idiomatic to use the map()
method instead of apply_async()
, although both will work.在这两个函数中,使用
map()
方法而不是apply_async()
可能更惯用,尽管两者都可以工作。 Usually that would look a bit like this:通常看起来有点像这样:
process_count = 3
def pre_process(input_data):
input_subsets = [[]] * process_count
for idx, data_point in enumerate(input_data):
<do any input validation on data_point>
input_subsets[idx % process_count].append(data_point)
return input_subsets
def process_data(input_data):
return_val = []
for val in input_data:
<do some processing work>
return_val.append(<result of processing>)
return return_val
data_subsets = pre_process(raw_data)
pool = mp.Pool(process_count)
result_list = pool.map(process_data, data_subsets)
<check result_list>
ok, found a hacky answer, would love to hear if there's something better:好的,找到了一个骇人听闻的答案,很想听听是否有更好的方法:
def run_3_processes_3_times():
pool = ThreadPool(3)
candidates = [pool.apply_async(run_3_processes,
args=(c)) for c in range(3)]
candidates = [x.get() for x in candidates]
pool.close()
def run_3_processes(c):
pool = mp.Pool(3)
solutions = [pool.apply_async(do_it,
args=(i) for i in range(3)]
solutions = [x.get() for x in solutions]
pool.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.