簡體   English   中英

Python:multiprocessing.Queue() 中可能丟失數據

[英]Python: possible data loss in multiprocessing.Queue()

假設我有以下示例,其中我創建了一個守護進程並嘗試通過事件標志與其通信:

from multiprocessing import Process, Event, Queue
import time

def reader(data):
    input_queue = data[0]
    e = data[1]
    output_queue = data[2]

    while True:
        if not e.is_set(): # if there is a signal to start
            msg = input_queue.get()         # Read from the queue 
            output_queue.put(msg)     # copy to output_queue
            if (msg == 'DONE'):  # signal to stop              
                e.set() # signal that worker is done


def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    input_queue = Queue()   # reader() reads from queue
                          # writer() writes to queue

    output_queue = Queue()


    e = Event()
    e.set()

    reader_p = Process(target=reader, args=((input_queue, e, output_queue),))
    reader_p.daemon = True
    reader_p.start()        # Launch reader() as a separate python process

    for count in [10**4, 10**5, 10**6]:



        _start = time.time()
        writer(count, input_queue)    # Send a lot of stuff to reader()

        e.clear() # unset event, giving signal to a worker


        e.wait() # waiting for reader to finish


        # fetch results from output_queue:
        results = []
        while not output_queue.empty():
            results += [output_queue.get()]

        print(len(results)) # check how many results we have

        print("Sending %s numbers to Queue() took %s seconds" % (count, 
            (time.time() - _start)))

我使用輸入和輸出隊列,在這個示例中,worker 只是將數據復制到輸出,稍后我將在程序中獲取這些數據。 在數據長度為 10k 之前,一切似乎都很好(它實際上是隊列大小限制,以字節為單位嗎?),但是當我嘗試復制更多元素時,我收到了隨機數量的結果,但比發送的要少得多:

10001
Sending 10000 numbers to Queue() took 0.4259309768676758 seconds
18857
Sending 100000 numbers to Queue() took 1.693629503250122 seconds
12439
Sending 1000000 numbers to Queue() took 10.592029809951782 seconds

10001
Sending 10000 numbers to Queue() took 0.41446948051452637 seconds
46615
Sending 100000 numbers to Queue() took 1.9259979724884033 seconds
18623
Sending 1000000 numbers to Queue() took 10.06524133682251 seconds

更新:現在我嘗試在三個工人之間共享數據。 我已經檢查過它們是否都在工作,但數據丟失並沒有停止:

import multiprocessing
from multiprocessing import Process, Event, Queue
import time

def reader(data):
    input_queue = data[0]
    e = data[1]
    output_queue = data[2]


    while True:
        if not e.is_set(): # if there is a signal to start

                #if not output_queue.empty(): # hangs somewhy
                msg = input_queue.get()         # Read from the queue 
                output_queue.put(msg)     # copy to output_queue
                #print("1")
                if (msg == 'DONE'):  # signal to stop              
                    e.set() # signal that there is no more data
                    print("done")



def reader1(data):
    input_queue = data[0]
    e = data[1]
    output_queue = data[2]


    while True:
        if not e.is_set(): # if there is a signal to start
                msg = input_queue.get()         # Read from the queue 
                output_queue.put(msg)     # copy to output_queue
                #print("2")
                if (msg == 'DONE'):  # signal to stop              
                    e.set() # signal that there is no more data
                    print("done")


def reader2(data):
    input_queue = data[0]
    e = data[1]
    output_queue = data[2]

    while True:
        if not e.is_set(): # if there is a signal to start
                msg = input_queue.get()         # Read from the queue 
                output_queue.put(msg)     # copy to output_queue
                #print("3")
                if (msg == 'DONE'):  # signal to stop              
                    e.set() # signal that there is no more data
                    print("done")






def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':

    # I do not use manager, as it makes everything extremely slow
    #m = multiprocessing.Manager()
    #input_queue = m.Queue()

    input_queue = Queue()   # reader() reads from queue
                          # writer() writes to queue

    output_queue = Queue()


    e = Event()
    e.set()

    reader_p = Process(target=reader, args=((input_queue, e, output_queue),))
    reader_p.daemon = True
    reader_p.start()        # Launch reader() as a separate python process

    reader_p1 = Process(target=reader1, args=((input_queue, e, output_queue),))
    reader_p1.daemon = True
    reader_p1.start() 

    reader_p2 = Process(target=reader2, args=((input_queue, e, output_queue),))
    reader_p2.daemon = True
    reader_p2.start() 

    for count in [10**4, 10**5, 10**6]:


        _start = time.time()
        writer(count, input_queue)    # Send a lot of stuff to readers

        e.clear() # unset event, giving signal to a worker


        e.wait() # waiting for reader to finish


        # fetch results from output_queue:
        results = []
        while not output_queue.empty():
            results += [output_queue.get()]

        print(len(results)) # check how many results we have

        print("Sending %s numbers to Queue() took %s seconds" % (count, 
            (time.time() - _start)))

結果,有時我正確完成了第二階段:

done
10001
Sending 10000 numbers to Queue() took 0.37468671798706055 seconds
done
18354
Sending 100000 numbers to Queue() took 1.2723915576934814 seconds
done
34807
Sending 1000000 numbers to Queue() took 9.1871018409729 seconds

done
10001
Sending 10000 numbers to Queue() took 0.37137532234191895 seconds
done
100001
Sending 100000 numbers to Queue() took 2.5747978687286377 seconds
done
217034
Sending 1000000 numbers to Queue() took 12.640174627304077 seconds

隊列大小確實有限制:在多處理中,這個限制不可靠,一旦達到,queue.put 就會被阻塞,直到隊列被清空。 有關更多信息,請參閱文檔: https : //docs.python.org/2/library/multiprocessing.html#multiprocessing.Queue

在你的情況下,這不是問題。 您剛剛定義了一個不好的條件來停止獲取結果:

while not output_queue.empty():
     results += [output_queue.get()]

在您的情況下,如果作者比讀者慢(他們有時會),即使作者尚未完成發送所有內容,您的隊列也可能會暫時為空。 這就是為什么你的閱讀量不穩定。

為了確認,我用這個替換了這個條件:

t0 = time.time()
while time.time()-t0<30: # seems to be enough to complete your loops, but it's just a demo condition, you should not use this
    try:
        results += [output_queue.get(timeout=1)]
    except Exception as expt: # the output_queue.get(timeout=1) will wait up to 1 second if the queue is momentarily empty. If the queue is empty for more than 1 sec, it raises an exception and it means the loop is complete. Again, this is not a good condition in real life, and this is just for testing.
        break

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM