简体   繁体   English

Python 多处理:子进程完成但没有加入

[英]Python Multiprocessing: The child process finished but did not join

I try to implement a multiprocessing code for generating some dictionary interested.我尝试实现一个多处理代码来生成一些感兴趣的字典。

Here is my logic:这是我的逻辑:

from multiprocessing import Manager, Queue, Process

input_list = Queue()
for x in my_input:  # my_input is my input data
  input_list.put(x)

output = Manager().dict()

def job():
  while input_list.qsize()>0:
    x = input_list.get()
    result = my_func(x)  # Do something here
    output[x] = result

def monitor():
  while True:
    if input_list.qsize() > 0:
      time.sleep(1)
      print("Item List is Empty")
      print("Does all the result being save?", len(output.keys()) == len(my_input))

job_list = [Process(target=monitor)]
for _ in range(num_of_worker):
  job_list.append(Process(target=job))
for j in job_list:
  j.start()
for j in job_list:
  j.join()

print("The script is finished")

The logic of my code is quite simple.我的代码逻辑很简单。

  • Initialize a queue and put my input in.初始化一个队列并将我的输入放入。
  • Define two functions, job (doing something and save it to a dict) and monitor (print when everything inside queue is being processed and print how many results are being saved).定义两个函数, job (做某事并将其保存到字典)和monitor (打印队列中的所有内容时打印并打印保存了多少结果)。
  • Then standard multiprocessing start and join .然后标准多处理startjoin

The output I am getting: output 我得到:

Item List is Empty
Does all the result being save? True
...

Some child process did not finish and not yet join.一些子进程没有完成,还没有加入。 The script is stuck here and did not print "The script is finished".脚本卡在这里,没有打印“脚本完成”。

My script will get stuck at the join statement, despite the monitor telling me that everything is finished (by checking number of items left in input_list and number of results stored in output ).尽管监视器告诉我一切都已完成(通过检查 input_list 中剩余的项目数和input_list中存储的结果output ),但我的脚本将卡在join语句中。

Moreover, this error is not replicable.而且,这个错误是不可复制的。 If I see my script stuck for more than 5 minutes, I will terminate it manually and restart it.如果我看到我的脚本卡住超过 5 分钟,我将手动终止它并重新启动它。 I found that the script could finish properly (like 3 out of 10 times).我发现脚本可以正常完成(比如 10 次中有 3 次)。

What could be happening?会发生什么?

Remark: Since I suspect the error is some child process did not join, I tried something with Event .备注:由于我怀疑错误是某些子进程没有加入,所以我尝试了Event When the monitor found that the input_list is empty and output is completely filled, it will kill all the process.当监视器发现input_list为空且output完全填满时,会杀掉所有进程。 But the script is also stuck at the event triggering.但是脚本也停留在事件触发上。 (And same as above, the code does not get stuck every time, it works 3 out of 10 times). (和上面一样,代码不会每次都卡住,它可以工作 10 次中的 3 次)。

@Homer512 comments gives me insight on wher is the mistake in the code. @Homer512 评论让我了解代码中的错误在哪里。

switch from从切换

def job():
   while input_list.qsize>0:
      x = input_list.get()
      ...

to

def job():
       while input_list.qsize>0:
          try:
            x = input_list.get(True,5)
            ...
          except Empty:
            return 0

The reason for my script stuck at join because when input_list got only 1 element left, it trigger the while statement of job but only one process can get something from the queue.我的脚本停留在 join 的原因是当input_list只剩下 1 个元素时,它会触发job的 while 语句,但只有一个进程可以从队列中获取一些东西。 The other process will just stuck at get without suitable timeout.另一个进程将在没有适当超时的情况下停留在get上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM