简体   繁体   English

python 多处理子进程未正常退出

[英]python multiprocessing child processes not quiting normally

I've being using python multiprocessing for some task handling.我一直在使用 python 多处理来处理一些任务。 The dev enviroment is Windows Server 2016 and python 3.7.0.开发环境是 Windows Server 2016 和 python 3.7.0。
Sometimes there were child processes that stayed in the task list.有时会有子进程留在任务列表中。 But actually, they seemed to be completed(data were writen into database).但实际上,它们似乎已完成(数据已写入数据库)。 The impact is that the logging stucked there, being unable to append latest logs.影响是日志卡在那里,无法 append 最新日志。

进程不退出

Here is the code.这是代码。 Main function starts a listener process and several worker processes:主要 function 启动一个监听进程和几个工作进程:

queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process, args=(queue, listener_configurer))
listener.start()

...

workers = []
for loop:
    worker = multiprocessing.Process(target=process_start, args=(queue, worker_configurer, plist))
    workers.append(worker)
    worker.start()
for w in workers:
    w.join()

...

queue.put_nowait(None)
listener.join()

The listener process ends when it gets None, thus resulting the whole task to end.侦听器进程在它得到 None 时结束,从而导致整个任务结束。

def listener_process(queue, configurer):
    configurer()
    while True:
        try:
            record = queue.get()
            if record is None:
                break
            if type(record) is not int:
                Logger = logging.getLogger(record.name)
                Logger.handle(record)
        except Exception as e:
            Logger.error(str(e), exc_info=True)

Task is scheduled to run by windows task scheduler.任务由 windows 任务调度程序调度运行。
Any idea why some multiprocessing processes were 'stuck' there?知道为什么一些多处理进程“卡”在那里吗?
It's being bothering me for some time.它困扰了我一段时间。 Thanks in advance.提前致谢。

Can I say for sure what is your problem?我可以肯定地说你的问题是什么吗? No. Can I say for sure you are doing something that can lead to a deadlock?不,我可以肯定地说你正在做一些可能导致僵局的事情吗? Yes.是的。

If you read the documentation carefully on multiprocessing.Queue , you will see the following warning:如果您仔细阅读有关multiprocessing.Queue的文档,您将看到以下警告:

Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread ), then that process will not terminate until all buffered items have been flushed to the pipe.警告:如上所述,如果子进程已将项目放入队列(并且它没有使用JoinableQueue.cancel_join_thread ),则该进程将不会终止,直到所有缓冲项目都已刷新到 pipe。

This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed.这意味着如果您尝试加入该进程,您可能会遇到死锁,除非您确定已放入队列的所有项目都已被消耗。 Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.类似地,如果子进程是非守护进程,则父进程在尝试加入其所有非守护子进程时可能会挂起退出。

Note that a queue created using a manager does not have this issue.请注意,使用管理器创建的队列不存在此问题。 See Programming guidelines.请参阅编程指南。

This means that to be completely safe, you must join the listener process (which is issuing gets from the queue) first before joining the workers processes (which are issuing puts to the queue) to ensure that all the messages put to the queue have been read off the queue before you attempt to join the tasks that have done the puts to the queue.这意味着为了完全安全,您必须先加入侦听器进程(从队列中发出 get),然后再加入workers进程(向队列发出 put),以确保放入队列的所有消息都已在您尝试加入已完成放入队列的任务之前,请先读取队列。

But then how will the listener process know when to terminate if currently it is looking for the main process to write a None sentinel message to the queue signifying that it is quitting time but in the new design the main process must first wait for the listener to terminate before it waits for the workers to terminate?但是,如果当前正在寻找主进程向队列写入None标记消息,表示它正在退出时间,那么侦听器进程将如何知道何时终止,但在新设计中,主进程必须首先等待侦听器在等待工人终止之前终止? Presumably you have control over the source of the process_start function that implements the producer of messages that are written to the queue and presumably something triggers its decision to terminate.大概您可以控制process_start function 的源,它实现了写入队列的消息的生产者,并且可能触发了它终止的决定。 When these processes terminate it is they that must each write a None sentinel message to the queue signifying that they will not be producing any more messages.当这些进程终止时,它们必须各自向队列写入一条None标记消息,表示它们将不再产生任何消息。 Then funtion listener_process must be passed an additional argument, ie the number of message producers so that it knows how many of these sentinels it should expect to see.然后函数listener_process必须传递一个附加参数,即消息生产者的数量,以便它知道它应该期望看到多少这些哨兵 Unfortunately, I can't discern from what you have coded, ie for loop: , what that number of processes is and it appears that you are instantiating each process with identical arguments.不幸的是,我无法从您编码的内容(即for loop: )中辨别出该进程的数量是多少,并且您似乎正在使用相同的 arguments 实例化每个进程。 But for the sake of clarity I will modify your code to something that is more explicit:但为了清楚起见,我会将您的代码修改为更明确的内容:

queue = multiprocessing.Queue(-1)
listener = multiprocessing.Process(target=listener_process, args=(queue, listener_configurer, len(plist)))
listener.start()

...

workers = []
# There will be len(plist) producer of messages:
for p in plist:
    worker = multiprocessing.Process(target=process_start, args=(queue, worker_configurer, p))
    workers.append(worker)
    worker.start()
listener.join() # join the listener first
for w in workers:
    w.join()


....


def listener_process(queue, configurer, n_producers):
    configurer()
    sentinel_count = 0
    while True:
        try:
            record = queue.get()
            if record is None:
                sentinel_count += 1
                if sentinel_count == n_producers:
                    break # we are done
                continue
            if type(record) is not int:
                Logger = logging.getLogger(record.name)
                Logger.handle(record)
        except Exception as e:
            Logger.error(str(e), exc_info=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM