[英]How to distinguish processes in Multiprocessing.Pool?
I'm using python multiprocessing
to fork some child processes to run my jobs. 我正在使用python multiprocessing
来分叉一些子进程来运行我的工作。 There are two demands: 有两个要求:
But I get: 但是我得到:
by multiprocessing.Process()
has an attribute "pid" to get its pid. by multiprocessing.Process()
生成by multiprocessing.Process()
进程具有属性“ pid”以获取其pid。 But I can't add my asynchronous callback, of course I can't wait synchronously neither. 但是我不能添加异步回调,当然我也不能同步等待。 multiprocessing.Pool()
provides callback interface. multiprocessing.Pool()
生成的进程池提供了回调接口。 But I can't tell which process in the pool is the one matching my job, since I may need to kill the process according to a specific job. 但是我无法确定池中哪个进程与我的工作匹配,因为我可能需要根据特定的工作来终止该进程。 Task is cheap, here shows the code: 任务很便宜,这里显示代码:
import random, time
import multiprocessing
import os
class Job(object):
def __init__(self, jobid, jobname, command):
self.jobid, self.jobname, self.command = jobid, jobname, command
def __str__(self):
return "Job <{0:05d}>".format(self.jobid)
def __repr__(self):
return self.__str__()
def _run_job(job):
time.sleep(1)
print "{} done".format(job)
return job, random.choice([True, False]) # the second argument indicates whether job has finished successfully
class Test(object):
def __init__(self):
self._loc = multiprocessing.Lock()
self._process_pool = multiprocessing.Pool()
def submit_job(self, job):
with self._loc:
self._process_pool.apply_async(_run_job, (job,), callback=self.job_done)
print "submitting {} successfully".format(job)
def job_done(self, result):
with self._loc:
# stuffs after job has finished is related to some cleanning work, so it needs the lock of the parent process
job, success = result
if success:
print "{} success".format(job)
else:
print "{} failure".format(job)
j1 = Job(1, "test1", "command1")
j2 = Job(2, "test2", "command2")
t = Test()
t.submit_job(j1)
t.submit_job(j2)
time.sleep(3.1) # wait for all jobs finishing
But now I can't get the pid corresponding to each job. 但是现在我无法获得与每个工作相对应的pid。 For example, I need to kill the job<1>, but I can't find which process in the process pool is related to the job<1>, so I mightn't kill the job whenever I want. 例如,我需要终止作业<1>,但是我找不到进程池中与该作业<1>相关的进程,因此我可能不会随时终止该作业。
If I use multiprocessing.Process
alternatively, I can record pid of every process with its corresponding jobid. 如果我使用multiprocessing.Process
,则可以记录每个进程的pid及其对应的jobid。 But I can't add callback method now. 但是我现在不能添加回调方法。
So is there a way to both get the pid of child process and to add callback method? 那么,有没有一种方法既可以获取子进程的pid,又可以添加回调方法?
Finally I find a solution: use multiprocessing.Event
instead. 最后,我找到了一个解决方案:改用multiprocessing.Event
。
Since multiprocessing.Pool
can't tell me which process is allocated, I can't record it so that I can kill it according to job id whenever I want. 由于multiprocessing.Pool
无法告诉我分配了哪个进程,因此无法记录它,因此我可以随时根据作业ID杀死它。
Fortunately, multiprocessing
provides Event
object as an alternative to callback method. 幸运的是, multiprocessing
提供了Event
对象作为回调方法的替代方法。 Recall what callback method does: it provides an asynchronous response to child process. 回顾回调方法的作用:它提供对子进程的异步响应。 Once child process finishes, the parent process can detect it and call the callback method. 子进程完成后,父进程可以检测到它并调用回调方法。 So the core issue is how parent process detects whether child process has finished or not. 因此,核心问题是父进程如何检测子进程是否已完成。 That's Event
object for. 那是Event
对象。
So the solution is simple: pass a Event
object to child process. 因此解决方案很简单:将Event
对象传递给子进程。 Once child process finishes, it set the Event
object. 子进程完成后,将设置Event
对象。 In parent process, it starts a daemon thread to monitor whether the event is set. 在父进程中,它启动守护程序线程以监视是否设置了事件。 If so, it can call the method that does those callback stuffs. 如果是这样,它可以调用执行那些回调操作的方法。 Moreover, since I created processes with multiprocessing.Process
instead of multiprocessing.Pool
, I can easily get its PID, which enables me to kill it. 此外,由于我使用multiprocessing.Process
而不是multiprocessing.Pool
创建了进程,因此我可以轻松获取其PID,从而可以将其杀死。
The solution code: 解决方案代码:
import time
import multiprocessing
import threading
class Job(object):
def __init__(self, jobid, jobname, command):
self.jobid, self.jobname, self.command = jobid, jobname, command
self.lifetime = 0
def __str__(self):
return "Job <{0:05d}>".format(self.jobid)
def __repr__(self):
return self.__str__()
def _run_job(job, done_event):
time.sleep(1)
print "{} done".format(job)
done_event.set()
class Test(object):
def __init__(self):
self._loc = multiprocessing.Lock()
self._process_pool = {}
t = threading.Thread(target=self.scan_jobs)
t.daemon = True
t.start()
def scan_jobs(self):
while True:
with self._loc:
done_jobid = []
for jobid in self._process_pool:
process, event = self._process_pool[jobid]
if event.is_set():
print "Job<{}> is done in process <{}>".format(jobid, process.pid)
done_jobid.append(jobid)
map(self._process_pool.pop, done_jobid)
time.sleep(1)
def submit_job(self, job):
with self._loc:
done_event = multiprocessing.Event()
new_process = multiprocessing.Process(target=_run_host_job, args=(job, done_event))
new_process.daemon = True
self._process_pool[job.jobid] = (new_process, done_event)
new_process.start()
print "submitting {} successfully".format(job)
j1 = Job(1, "test1", "command1")
j2 = Job(2, "test2", "command2")
t = Test()
t.submit_job(j1)
t.submit_job(j2)
time.sleep(5) # wait for job to finish
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.