簡體   English   中英

Python 多處理池某些進程在分叉時處於死鎖狀態,但在生成時運行

[英]Python multiprocessing pool some process in deadlock when forked but runs when spawned

所以我嘗試嘗試一些服務下載和調整圖像大小(使用線程下載圖像和進程來調整它們的大小)。 我啟動下載線程(使用將監視它們的管理器線程),一旦圖像保存在本地,我就會將其路徑添加到隊列中。 下載所有圖像后,管理器線程將向隊列添加毒丸。

主線程同時監視隊列並在下載路徑時從中獲取路徑,並從池中啟動一個新的異步進程以調整圖像大小。

最后,當我嘗試加入游泳池時,它有時會掛起,似乎是一個僵局。 它不會每次都發生,但 IMG_URLS 列表中的 url 越多,它發生的頻率就越高。 如果發生這種死鎖,日志會告訴我們某些進程沒有正確啟動或立即處於死鎖狀態,因為“resize {file}”日志沒有為它們出現。

import logging
import multiprocessing as mp
import time
from queue import Queue
from threading import Thread


def resize_image(file):
    logging.info(f"resizing {file}")
    time.sleep(0.1)
    logging.info(f"done resizing {file}")


class Service(object):
    def __init__(self):
        self.img_queue = Queue()

    def download_image(self, url) -> None:
        logging.info(f"downloading image from URL {url}")
        time.sleep(1)
        file = f"local-{url}"
        self.img_queue.put(file)
        logging.info(f"image saved to {file}")

    def download_images(self, img_url_list: list):
        logging.info("beginning image downloads")

        threads = []
        for url in img_url_list:
            t = Thread(target=self.download_image, args=(url,))
            t.start()
            threads.append(t)

        for t in threads:
            t.join()
        logging.info("all images downloaded")
        self.img_queue.put(None)

    def resize_images(self):
        logging.info("beginning image resizing")
        with mp.Pool() as p:
            while True:
                file = self.img_queue.get()
                if file is None:
                    logging.info("got SENTINEL")
                    break
                logging.info(f"got {file}")
                p.apply_async(func=resize_image, args=(file,))
            p.close()
            p.join()
        logging.info("all images resized")

    def run(self, img_url_list):
        logging.info("START service")

        dl_manager_thread = Thread(target=self.download_images, args=(img_url_list,))
        dl_manager_thread.start()
        self.resize_images()

        logging.info(f"END service")


if __name__ == "__main__":
    FORMAT = "[%(threadName)s, %(asctime)s, %(levelname)s] %(message)s"
    logging.basicConfig(level=logging.DEBUG, format=FORMAT)

    IMG_URLS = list(range(8))

    service = Service()
    service.run(IMG_URLS)

使用 python 3.8.5(Ubuntu 20.04,Ryzen 2600)運行它時。 我得到以下信息:

[MainThread, 2020-11-30 19:58:01,257, INFO] START service
[Thread-1, 2020-11-30 19:58:01,257, INFO] beginning image downloads
[MainThread, 2020-11-30 19:58:01,257, INFO] beginning image resizing
[Thread-2, 2020-11-30 19:58:01,258, INFO] downloading image from URL 0
[Thread-3, 2020-11-30 19:58:01,258, INFO] downloading image from URL 1
[Thread-4, 2020-11-30 19:58:01,258, INFO] downloading image from URL 2
[Thread-5, 2020-11-30 19:58:01,259, INFO] downloading image from URL 3
[Thread-6, 2020-11-30 19:58:01,260, INFO] downloading image from URL 4
[Thread-7, 2020-11-30 19:58:01,260, INFO] downloading image from URL 5
[Thread-8, 2020-11-30 19:58:01,261, INFO] downloading image from URL 6
[Thread-9, 2020-11-30 19:58:01,262, INFO] downloading image from URL 7
[Thread-2, 2020-11-30 19:58:02,259, INFO] image saved to local-0
[MainThread, 2020-11-30 19:58:02,260, INFO] got local-0
[Thread-3, 2020-11-30 19:58:02,260, INFO] image saved to local-1
[Thread-4, 2020-11-30 19:58:02,260, INFO] image saved to local-2
[MainThread, 2020-11-30 19:58:02,261, INFO] got local-1
[MainThread, 2020-11-30 19:58:02,261, INFO] resizing local-0
[Thread-5, 2020-11-30 19:58:02,261, INFO] image saved to local-3
[Thread-6, 2020-11-30 19:58:02,261, INFO] image saved to local-4
[MainThread, 2020-11-30 19:58:02,261, INFO] got local-2
[MainThread, 2020-11-30 19:58:02,262, INFO] got local-3
[MainThread, 2020-11-30 19:58:02,262, INFO] resizing local-1
[Thread-7, 2020-11-30 19:58:02,262, INFO] image saved to local-5
[MainThread, 2020-11-30 19:58:02,262, INFO] got local-4
[MainThread, 2020-11-30 19:58:02,263, INFO] got local-5
[MainThread, 2020-11-30 19:58:02,263, INFO] resizing local-3
[Thread-8, 2020-11-30 19:58:02,263, INFO] image saved to local-6
[MainThread, 2020-11-30 19:58:02,263, INFO] resizing local-4
[MainThread, 2020-11-30 19:58:02,263, INFO] resizing local-5
[MainThread, 2020-11-30 19:58:02,263, INFO] got local-6
[MainThread, 2020-11-30 19:58:02,264, INFO] resizing local-6
[Thread-9, 2020-11-30 19:58:02,264, INFO] image saved to local-7
[MainThread, 2020-11-30 19:58:02,265, INFO] got local-7
[Thread-1, 2020-11-30 19:58:02,265, INFO] all images downloaded
[MainThread, 2020-11-30 19:58:02,265, INFO] got SENTINEL
[MainThread, 2020-11-30 19:58:02,265, INFO] resizing local-7
[MainThread, 2020-11-30 19:58:02,362, INFO] done resizing local-0
[MainThread, 2020-11-30 19:58:02,363, INFO] done resizing local-1
[MainThread, 2020-11-30 19:58:02,363, INFO] done resizing local-3
[MainThread, 2020-11-30 19:58:02,364, INFO] done resizing local-4
[MainThread, 2020-11-30 19:58:02,364, INFO] done resizing local-5
[MainThread, 2020-11-30 19:58:02,364, INFO] done resizing local-6
[MainThread, 2020-11-30 19:58:02,366, INFO] done resizing local-7

有時在這里它開始掛起。 請注意,缺少調整 local-2日志的大小,因此該進程未啟動或等待某些內容。

如果我將池更改為使用產卵而不是分叉,它可以正常工作。 我猜在某些情況下,前叉復制了已獲得的 state 中的一些鎖,這就是問題所在,但我不清楚在哪里以及為什么。

with mp.get_context("spawn").Pool() as p:

任何想法?

有時(當你不走運時)當你的池正在旋轉時,當你的下載線程正在向logging模塊寫入一條消息時,其中一個子進程將被“分叉”。 logging模塊使用受鎖保護的隊列來傳遞消息,因此當“分叉”發生時,該鎖可以復制到鎖定的 state 中。 然后,當下載線程完成將其消息寫入隊列時,只有主進程上的鎖被釋放,因此您留下一個子進程等待該鎖的副本向logging寫入消息。 該鎖永遠不會被釋放,因為下載器線程不會被復制(fork 不會復制線程)。 這就是發生的死鎖。 這種類型的錯誤可以通過某些方式進行修補,但這是“spawn”存在的原因之一。

此外,“spawn”是所有架構都支持的唯一方法。 在沒有意識到的情況下使用恰好是多線程的庫非常容易,而且“fork”並不是真正的多線程友好。 如果您確實需要“fork”提供的減少開銷,我對“forkserver”知之甚少。 從理論上講,它是多線程安全的。

叉子

父進程使用 os.fork() 來分叉 Python 解釋器。 子進程在開始時實際上與父進程相同。 父進程的所有資源都由子進程繼承。 請注意,安全地分叉多線程進程是有問題的。

這是一個更深入的討論,其中包含一些關於這個問題的參考資料,我將其用作我的主要資源

只是一些額外的信息來擴展亞倫的好答案。

這個 python 錯誤/增強似乎是完全相同的東西: https://bugs.python.org/issue6721

我在另一個問題中發現了同樣的問題: Deadlock with logging multiprocess/multithread python script

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM