简体   繁体   English

python,multiprocessing和dmtcp:在Pool中检查一个进程?

[英]python, multiprocessing and dmtcp: checkpointing one process in Pool?

Is it possible to use python's integration of dmtcp to checkpoint a child process in parallel execution? 是否可以使用python的dmtcp集成在并行执行中检查子进程?

My situation is as follows: I have a multiprocessing.Pool with several workers receiving async jobs (using apply_async). 我的情况如下:我有一个multiprocessing.Pool,有几个工作人员接收异步作业(使用apply_async)。 Certain big jobs require all the resources (cpu cores & memory). 某些大型工作需要所有资源(CPU内核和内存)。 When one of these jobs is accepted, I'd like to checkpoint all pending processes, kick them out execution, launch the big job and finally resume the checkpointed processes. 当其中一项作业被接受时,我想检查所有待处理的流程,将它们踢出执行,启动大型作业,最后恢复已检查的流程。

If you start your python program using dmtcp_launch python ... or dmtcp_launch ./myapp.py , all child processes created by the main process are automatically under checkpoint control. 如果使用dmtcp_launch python ...dmtcp_launch ./myapp.py启动python程序,则由主进程创建的所有子进程都将自动处于检查点控制之下。 Thus, when you try to checkpoint the computation from within your main process, all other processes are checkpointed as well. 因此,当您尝试从主流程中检查计算时,所有其他流程也将被检查。

I am not too familiar with multiprocessing.Pool to make detailed comments on that front, but from what I understood in one quick minute, you don't want to checkpoint your main process (scheduler). 我对多处理不太熟悉,可以在这方面做详细的评论,但是据我了解,很快,您就不想检查主要流程(调度程序)了。 However, DMTCP will checkpoint restart the entire computation (including the scheduler) as a single unit. 但是,DMTCP将检查点作为单个单元重新启动整个计算(包括调度程序)。 Is that acceptable? 可以接受吗? If not, the alternative is to not launch the scheduler under DMTCP control, but modify it to launch only the child/slave processes under checkpoint control. 如果不是,则替代方法是不在DMTCP控制下启动调度程序,而是对其进行修改以仅在检查点控制下启动子进程/从属进程。 I am not sure if that's something you can do in you application. 我不确定这是否可以在您的应用程序中完成。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM