簡體   English   中英

“GIL”如何影響 Python asyncio `run_in_executor` 與 i/o 綁定任務?

[英]How "GIL" affects Python asyncio `run_in_executor` with i/o bound tasks?

關於 Python ayncio run_in_executor的代碼示例

import asyncio
import concurrent.futures

def blocking_io():
    # File operations (such as logging) can block the
    # event loop: run them in a thread pool.
    with open('/dev/urandom', 'rb') as f:
        return f.read(100)

def cpu_bound():
    # CPU-bound operations will block the event loop:
    # in general it is preferable to run them in a
    # process pool.
    return sum(i * i for i in range(10 ** 7))

async def main():
    loop = asyncio.get_running_loop()

    ## Options:

    # 1. Run in the default loop's executor:
    result = await loop.run_in_executor(
        None, blocking_io)
    print('default thread pool', result)

    # 3. Run in a custom process pool:
    with concurrent.futures.ProcessPoolExecutor() as pool:
        result = await loop.run_in_executor(
            pool, cpu_bound)
        print('custom process pool', result)

asyncio.run(main())

該示例(在注釋中)建議使用ThreadPoolExecutor運行 i/o 綁定的 function ,並使用ProcessPoolExecutor運行 CPU 綁定的 function 。 我想用三個問題來驗證我對這背后原因的理解:

  1. 這些建議並不是真正的建議,因為否則事件循環將阻塞。 因此,我們將失去事件編程的主要好處,對嗎?

  2. 將 io/ 綁定任務作為單獨的線程運行,需要以下假設: i/o 調用將釋放 GIL,對嗎? 因為除此之外,操作系統將無法在事件循環和這個新的單獨線程之間進行上下文切換。

  3. 如果第 2 點的答案是肯定的,那么如何確定 i/o 調用是否釋放 GIL?

這些建議並不是真正的建議,因為否則事件循環將阻塞。 因此,我們將失去事件編程的主要好處,對嗎?

如果您在協程中調用阻塞(I/O 和 CPU 阻塞)function 而不等待執行程序,則事件循環將阻塞。 在這方面,是的,您不應該允許這種情況發生。

建議我會說它是每種類型的阻塞代碼的一種執行器:對 CPU 綁定的東西使用 ProcessPoolExecutor,對 I/O 綁定的東西使用 ThreadPoolExecutor。

將 io/ 綁定任務作為單獨的線程運行,需要以下假設: i/o 調用將釋放 GIL,對嗎? 因為除此之外,操作系統將無法在事件循環和這個新的單獨線程之間進行上下文切換。

在多線程方面,Python 將在很短的時間內在線程之間切換而不會釋放 GIL。 但如果一個或多個線程有 I/O(或 C 代碼),那么 GIL 將被釋放,允許解釋器花更多時間處理需要它的線程。

底線是:

  • 您可以在執行程序中運行任何阻塞代碼,它不會阻塞事件循環。 您獲得並發性,但可能會或可能不會獲得性能。
  • 例如,如果您在 ThreadPoolExecutor 中運行 CPU 密集型代碼,由於 GIL,您不會從並發中獲得性能優勢。 要獲得 CPU 密集型的性能,您應該使用 ProcessPoolExecutor。
  • 但是 I/O-bound 可以在 ThreadPoolExecutor 中運行並獲得性能。 這里不需要使用更重的 ProcessPoolExecutor。

我寫了一個例子來演示它是如何工作的:

import sys
import asyncio
import time
import concurrent.futures
import requests
from contextlib import contextmanager

process_pool = concurrent.futures.ProcessPoolExecutor(2)
thread_pool = concurrent.futures.ThreadPoolExecutor(2)


def io_bound():
    for i in range(3):
        requests.get("https://httpbin.org/delay/0.4")  # I/O blocking
        print(f"I/O bound {i}")
        sys.stdout.flush()


def cpu_bound():
    for i in range(3):
        sum(i * i for i in range(10 ** 7))  # CPU blocking
        print(f"CPU bound {i}")
        sys.stdout.flush()


async def run_as_is(func):
    func()


async def run_in_process(func):
    loop = asyncio.get_event_loop()
    await loop.run_in_executor(process_pool, func)


async def run_in_thread(func):
    loop = asyncio.get_event_loop()
    await loop.run_in_executor(thread_pool, func)


@contextmanager
def print_time():
    start = time.time()
    yield
    finished = time.time() - start
    print(f"Finished in {round(finished, 1)}\n")


async def main():
    print("Wrong due to blocking code in coroutine,")
    print(
        "you get neither performance, nor concurrency (which breaks async nature of the code)"
    )
    print("don't allow this to happen")
    with print_time():
        await asyncio.gather(run_as_is(cpu_bound), run_as_is(io_bound))

    print("CPU bound works concurrently with threads,")
    print("but you gain no performance due to GIL")
    with print_time():
        await asyncio.gather(run_in_thread(cpu_bound), run_in_thread(cpu_bound))

    print("To get perfromance for CPU-bound,")
    print("use process executor")
    with print_time():
        await asyncio.gather(run_in_process(cpu_bound), run_in_process(cpu_bound))

    print("I/O bound will gain benefit from processes as well...")
    with print_time():
        await asyncio.gather(run_in_process(io_bound), run_in_process(io_bound))

    print(
        "... but there's no need in processes since you can use lighter threads for I/O"
    )
    with print_time():
        await asyncio.gather(run_in_thread(io_bound), run_in_thread(io_bound))

    print("Long story short,")
    print("Use processes for CPU bound due to GIL")
    print(
        "and use threads for I/O bound since you benefit from concurrency regardless of GIL"
    )
    with print_time():
        await asyncio.gather(run_in_thread(io_bound), run_in_process(cpu_bound))


if __name__ == "__main__":
    asyncio.run(main())

Output:

Wrong due to blocking code in coroutine,
you get neither performance, nor concurrency (which breaks async nature of the code)
don't allow this to happen
CPU bound 0
CPU bound 1
CPU bound 2
I/O bound 0
I/O bound 1
I/O bound 2
Finished in 5.3

CPU bound works concurrently with threads,
but you gain no performance due to GIL
CPU bound 0
CPU bound 0
CPU bound 1
CPU bound 1
CPU bound 2
CPU bound 2
Finished in 4.6

To get perfromance for CPU-bound,
use process executor
CPU bound 0
CPU bound 0
CPU bound 1
CPU bound 1
CPU bound 2
CPU bound 2
Finished in 2.5

I/O bound will gain benefit from processes as well...
I/O bound 0
I/O bound 0
I/O bound 1
I/O bound 1
I/O bound 2
I/O bound 2
Finished in 3.3

... but there's no need in processes since you can use lighter threads for I/O
I/O bound 0
I/O bound 0
I/O bound 1
I/O bound 1
I/O bound 2
I/O bound 2
Finished in 3.1

Long story short,
Use processes for CPU bound due to GIL
and use threads for I/O bound since you benefit from concurrency regardless of GIL
CPU bound 0
I/O bound 0
CPU bound 1
I/O bound 1
CPU bound 2
I/O bound 2
Finished in 2.9

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM