簡體   English   中英

Python3多處理隊列和多個線程無法通過join()正確完成?

[英]Python3 multiprocessing Queue and multiple threads not completing from join() properly?

嘗試利用multiprocessing.Queue和threading.Thread來拆分大量任務(監視攝像機的健康檢查)。 鑒於下面的代碼,我正試圖了解何時檢查了所有攝像機(有32,000多個),但是我的輸出似乎從未到達main中的打印語句。

每個queue_worker都調用“ process_camera”,該進程當前執行所有運行狀況檢查並返回一個值(此部分有效!)。

當我看着它運行時,我發現它有點兒“完成”和“掛起”,因此出現了某種阻塞或導致其無法完成的情況……我嘗試使用以下命令進行get()和join()語句的處理:超時參數,但這似乎根本沒有效果!

我一直在盯着這段代碼和文檔三天了...有什么明顯的我沒看到嗎?

最終目標是對所有30,000個攝像機進行檢查(在腳本啟動時加載到all_cameras中),然后進行“循環”並繼續執行直到用戶中止腳本。

def queue_worker(camera_q, result_q):
    '''
    Function takes camera off the queue and calls healthchecks
    '''

    try:
        camera = camera_q.get()
        camera_status, remove_camera = process_camera(camera)

        result_q.put("Success")
        return True
    except queue.Empty:
        logging.info("Queue is empty")
        result_q.put("Fail")
        return False


def process_worker(camera_q, result_q, process_num, stop_event):
    while not stop_event.is_set():
        # Create configured number of threads and provide references to both Queues to each thread
        threads = []
        for i in range(REQUEST_THREADS):
            thread = threading.Thread(target=queue_worker, args=(camera_q, result_q))
            thread.setName("CameraThread-{}".format(i))
            threads.append(thread)
            thread.start()

        for thread in threads:
            thread.join(timeout=120)

        if camera_q.empty():
            num_active = sum([t.is_alive() for t in threads])
            logging.info("[Process {}] << {} >> active threads and << {} >> cameras left to process. << {} >> processed.".format(process_num, num_active, camera_q.qsize(), result_q.qsize()))


def main():
    '''
    Main application entry
    '''

    logging.info("Starting Scan With << " + str(REQUEST_THREADS) + " Threads and " + str(CHILD_PROCESSES) + " Processors >>")
    logging.info("Reference Images Stored During Scan << " + str(store_images) + " >>")

    stop_event = multiprocessing.Event()
    camera_q, result_q = multiprocessing.Queue(), multiprocessing.Queue()

    # Create a Status thread for maintaining process status
    create_status_thread()

    all_cameras = get_oversite_cameras(True)
    for camera in all_cameras:
        camera_q.put(camera)

    logging.info("<< {} >> cameras queued up".format(camera_q.qsize()))

    processes = []
    process_num = 0
    finished_processes = 0
    for i in range(CHILD_PROCESSES):
        process_num += 1
        proc = multiprocessing.Process(target=process_worker, args=(camera_q, result_q, process_num, stop_event))
        proc.start()
        processes.append(proc)

    for proc in processes:
        proc.join()
        finished_processes += 1
        logging.info("{} finished processes".format(finished_pr))

    logging.info("All processes finished")

編輯:不確定是否有幫助(可視),但是這是在使用2000台攝像機進行測試時當前輸出的示例:

[2018-11-01 23:47:41,854] INFO - MainThread - root - Starting Scan With << 100 Threads and 16 Processors >>
[2018-11-01 23:47:41,854] INFO - MainThread - root - Reference Images Stored During Scan << False >>
[2018-11-01 23:47:41,977] INFO - MainThread - root - << 2000 >> cameras queued up
[2018-11-01 23:47:54,865] INFO - MainThread - root - [Process 3] << 0 >> active threads and << 0 >> cameras left to process. << 1570 >> processed.
[2018-11-01 23:47:56,009] INFO - MainThread - root - [Process 11] << 0 >> active threads and << 0 >> cameras left to process. << 1575 >> processed.
[2018-11-01 23:47:56,210] INFO - MainThread - root - [Process 14] << 0 >> active threads and << 0 >> cameras left to process. << 1579 >> processed.
[2018-11-01 23:47:56,345] INFO - MainThread - root - [Process 9] << 0 >> active threads and << 0 >> cameras left to process. << 1580 >> processed.
[2018-11-01 23:47:59,118] INFO - MainThread - root - [Process 2] << 0 >> active threads and << 0 >> cameras left to process. << 1931 >> processed.
[2018-11-01 23:47:59,637] INFO - MainThread - root - [Process 15] << 0 >> active threads and << 0 >> cameras left to process. << 1942 >> processed.
[2018-11-01 23:48:00,310] INFO - MainThread - root - [Process 8] << 0 >> active threads and << 0 >> cameras left to process. << 1945 >> processed.
[2018-11-01 23:48:00,445] INFO - MainThread - root - [Process 13] << 0 >> active threads and << 0 >> cameras left to process. << 1946 >> processed.
[2018-11-01 23:48:01,391] INFO - MainThread - root - [Process 10] << 0 >> active threads and << 0 >> cameras left to process. << 1949 >> processed.
[2018-11-01 23:48:01,527] INFO - MainThread - root - [Process 5] << 0 >> active threads and << 0 >> cameras left to process. << 1950 >> processed.
[2018-11-01 23:48:01,655] INFO - MainThread - root - [Process 6] << 0 >> active threads and << 0 >> cameras left to process. << 1951 >> processed.
[2018-11-01 23:48:02,519] INFO - MainThread - root - [Process 1] << 0 >> active threads and << 0 >> cameras left to process. << 1954 >> processed.
[2018-11-01 23:48:06,915] INFO - MainThread - root - [Process 12] << 0 >> active threads and << 0 >> cameras left to process. << 1981 >> processed.
[2018-11-01 23:48:27,339] INFO - MainThread - root - [Process 16] << 0 >> active threads and << 0 >> cameras left to process. << 1988 >> processed.
[2018-11-01 23:48:28,762] INFO - MainThread - root - [Process 4] << 0 >> active threads and << 0 >> cameras left to process. << 1989 >> processed.

它在1989年“吊死”,而在2000年之前-很難調試!

由於它不是完整的清單,因此很難明確地回答。 例如,create_status_thread()的實現被隱藏。 這對於解決死鎖尤其棘手,因為死鎖通常是由對共享資源的特定訪問序列引起的,並且create_status_thread可能具有其中之一。 但是,一些建議:

  1. 您已經花了很多時間在上面,因此花一些時間來制作一個簡單的帶有腳手架代碼的示例不會有什么壞處。 我建議讓它只使用虛擬方法而不是實際的相機。 如果您還沒有的話,我也將嘗試使用較小的數字進行測試,並證明它首先適用於這些數字。 這也將帶來更好的StackOverflow問題;)
  2. 您根本需要多少個多線程? 30k攝像機聽起來很多,但是如果每次檢查都為2ms,則仍然每分鍾檢查一次。 復雜性值得嗎? 您的SLA是多少?
  3. 當輸入隊列上還有未處理的項目時,Process.join()的掛起行為與您的描述類似。 如果您一直運行到終端輸入,這似乎有可能。 您肯定在摘要中有很多輸入事件,例如camera_q和result_q。 參見https://docs.python.org/3.7/library/multiprocessing.html?highlight=process#programming-guidelines

請記住,將項目放入隊列的進程將在終止之前等待,直到所有緩沖的項目由“ feeder”線程饋送到基礎管道為止。 (子進程可以調用隊列的Queue.cancel_join_thread方法來避免這種行為。)

這意味着,無論何時使用隊列,都需要確保在加入該流程之前,將最終刪除隊列中已放置的所有項目。 否則,您無法確定將項目放入隊列的進程將終止。 還請記住,非守護進程將自動加入。

上面的鏈接包含一個阻塞示例,其中在get()之前調用join()。 在您的代碼中,根據執行順序的不同,在process_worker()內部調用get()似乎是可能的。

  1. 池可能是管理工人池的一種更簡單的方法。 參見https://docs.python.org/3.7/library/multiprocessing.html?highlight=process#using-a-pool-of-workers

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM