简体   繁体   English

Python多重处理过程不会终止

[英]Python mutiprocessing Process does not terminate

When I try to implement a parallel operation in python with multiprocessing library, I saw some processes do not terminate in non-intuitive manner. 当我尝试使用多处理库在python中实现并行操作时,我看到某些进程不会以非直观的方式终止。

My program consists of: 我的程序包括:

  • a queue, used for data transfer between processes 队列,用于进程之间的数据传输
  • a user process, which calculates something using data received via the queue 用户进程,该进程使用通过队列接收的数据来计算内容
  • two maker processes, which generate data and push to the queue 两个制造商流程,它们生成数据并推送到队列

Below is a simplified example. 下面是一个简化的示例。 make_data generates random numbers and push to the queue, and the use_data receives the data and compute the average. make_data生成随机数并推送到队列, use_data接收数据并计算平均值。 In total, 2*1000=2000 numbers are generated, and all of them are used. 总共生成2 * 1000 = 2000个数字,并全部使用。 This code runs as expected. 此代码按预期运行。 After all, all processes becomes terminated and no data is left in the queue. 毕竟,所有进程都将终止,队列中没有任何数据。

import random
from multiprocessing import Process, Queue

q = Queue(maxsize=10000)
def make_data(q):
    for i in range(1000):
        x = random.random()
        q.put(x)
    print("final line of make data")

def use_data(q):
    i = 0
    res = 0.0
    while i < 2000:
        if q.empty():
            continue
        i += 1
        x = q.get()
        res = res*(i-1)/i + x/i
    print("iter %6d, avg = %.5f" % (i, res))

u = Process(target=use_data, args=(q,))
u.start()

p1 = Process(target=make_data, args=(q,))
p1.start()
p2 = Process(target=make_data, args=(q,))
p2.start()


u.join(timeout=10)
p1.join(timeout=10)
p2.join(timeout=10)
print(u.is_alive(), p1.is_alive(), p2.is_alive(), q.qsize())

Outcome: 结果:

final line of make data
final line of make data
iter   2000, avg = 0.49655
False False False 0

Things change when I let the makers generate more than necessary data. 当我让制造商生成超出必要数据的东西时,情况发生了变化。 The code below differs from the above only in that each maker generates 5000 data, hence not all data are used. 下面的代码与上面的代码的不同之处仅在于每个制造商生成5000个数据,因此并非所有数据都被使用。 When this is run, it prints message of the final lines, but the maker processes never terminate (needs Ctrl-C to stop). 运行此命令时,它会打印最后一行的消息,但是制造商进程永远不会终止(需要Ctrl-C停止)。

import random
from multiprocessing import Process, Queue

q = Queue(maxsize=10000)
def make_data(q):
    for i in range(5000):
        x = random.random()
        q.put(x)
    print("final line of make data")

def use_data(q):
    i = 0
    res = 0.0
    while i < 2000:
        if q.empty():
            continue
        i += 1
        x = q.get()
        res = res*(i-1)/i + x/i
    print("iter %6d, avg = %.5f" % (i, res))

u = Process(target=use_data, args=(q,))
u.start()

p1 = Process(target=make_data, args=(q,))
p1.start()
p2 = Process(target=make_data, args=(q,))
p2.start()


u.join(timeout=10)
p1.join(timeout=10)
p2.join(timeout=10)
print(u.is_alive(), p1.is_alive(), p2.is_alive(), q.qsize())

Outcome: 结果:

final line of make data
final line of make data
iter   2000, avg = 0.49388
False True True 8000
# and never finish

It looks to me that all processes run to the end, so wonder why they keep alive. 在我看来,所有进程都将运行到最后,所以想知道为什么它们仍然存在。 Can someone help me understand this phenomenon? 有人可以帮助我了解这种现象吗?

I ran this program on python 3.6.6 from miniconda distribution. 我在miniconda发行版的python 3.6.6上运行了该程序。

The child processes putting items into the queue are stuck trying to actually put the object in the queue. 将项目放入队列的子进程被卡住,试图将对象实际放入队列。

A normal, non-multiprocessing, Queue object is implemented entirely in the address space of a single process. 普通的,非多处理的Queue对象完全在单个进程的地址空间中实现。 In that case the maxsize is the number of items that can be enqueued before a put() call blocks. 在这种情况下, maxsize是在put()调用块之前可以入队的项目数。 But a multiprocessing Queue object is implemented using an IPC mechanism; 但是,使用IPC机制实现了多处理Queue对象。 typically a pipe. 通常是管道。 And an OS pipe can queue a finite number of bytes (a typical limit is 8KB). 并且OS管道可以将有限数量的字节排队(通常限制为8KB)。 So when your use_data() process terminates after dequeuing just 2000 items the make_data() processes block because their IPC channel is full when flushing the locally queued items into the IPC on exit. 因此,当use_data()进程仅在使2000个项目出队后终止时, make_data()进程就会阻塞,因为在退出时将本地排队的项目刷新到IPC中时,其IPC通道已满。 This means they don't actually exit and thus the attempt to join() those processes blocks indefinitely. 这意味着它们实际上并没有退出,因此, join()这些进程的尝试会无限期地阻塞。

In effect you've created a deadlock. 实际上,您已经创建了一个死锁。 The exact threshold at which that occurs depends on how much data the IPC channel can buffer. 发生的确切阈值取决于IPC通道可以缓冲多少数据。 For example, on one of my Linux servers your second example works reliably with this inserted between the u.join() and the p1.join() : 例如,在我的一台Linux服务器上,您的第二个示例在u.join()p1.join()之间插入的情况下可以可靠地工作:

for _ in range(4000):
    q.get()

Reducing that range slightly (eg, to 3990) produces intermittent hangs. 将该范围稍微减小(例如,减小到3990)会产生间歇性的挂起。 Reducing the range more (eg, to 3500) will always hang because at least one of the processes stuffing data into the queue blocks while flushing its items into the IPC channel. 将该范围减小得更多(例如,减小到3500)将始终挂起,因为至少有一个进程将数据填充到队列块中,同时将其项刷新到IPC通道中。

The lesson of this story? 这个故事的教训? Always fully drain a multiprocessing queue before attempting to wait for the processes to terminate. 在尝试等待进程终止之前,请始终完全耗尽多处理队列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM