繁体   English   中英

在单独的线程中启动 asyncio 事件循环并消耗队列项

[英]Start asyncio event loop in separate thread and consume queue items

我正在编写一个 Python 程序,它同时运行从队列中提取的任务,以学习asyncio

通过与主线程(在 REPL 内)交互,项目将被放入队列中。 每当一个任务被放入队列时,它应该被立即消耗和执行。 我的方法是启动一个单独的线程并将队列传递给该线程内的事件循环。

任务正在运行,但只是按顺序运行,我不清楚如何同时运行这些任务。 我的尝试如下:

import asyncio
import time
import queue
import threading

def do_it(task_queue):
    '''Process tasks in the queue until the sentinel value is received'''
    _sentinel = 'STOP'

    def clock():
        return time.strftime("%X")

    async def process(name, total_time):
        status = f'{clock()} {name}_{total_time}:'
        print(status, 'START')
        current_time = time.time()
        end_time = current_time + total_time
        while current_time < end_time:
            print(status, 'processing...')
            await asyncio.sleep(1)
            current_time = time.time()
        print(status, 'DONE.')

    async def main():
        while True:
            item = task_queue.get()
            if item == _sentinel:
                break
            await asyncio.create_task(process(*item))

    print('event loop start')
    asyncio.run(main())
    print('event loop end')


if __name__ == '__main__':
    tasks = queue.Queue()
    th = threading.Thread(target=do_it, args=(tasks,))
    th.start()

    tasks.put(('abc', 5))
    tasks.put(('def', 3))

任何指导我同时运行这些任务的建议将不胜感激!
谢谢

更新
谢谢 Frank Yellin 和 cynthi8! 我根据您的建议对 main() 进行了改造:

  • asyncio.create_task之前移除await - 固定并发
  • 添加了 wait while 循环,以便 main 不会过早返回
  • 使用 Queue.get() 的非阻塞模式

该程序现在按预期工作👍

更新 2
user4815162342 提供了进一步的改进,我在下面注释了他的建议。

'''
Starts auxiliary thread which establishes a queue and consumes tasks within a
queue.
    
Allow enqueueing of tasks from within __main__ and termination of aux thread
'''
import asyncio
import time
import threading
import functools

def do_it(started):
    '''Process tasks in the queue until the sentinel value is received'''
    _sentinel = 'STOP'

    def clock():
        return time.strftime("%X")

    async def process(name, total_time):
        print(f'{clock()} {name}_{total_time}:', 'Started.')
        current_time = time.time()
        end_time = current_time + total_time
        while current_time < end_time:
            print(f'{clock()} {name}_{total_time}:', 'Processing...')
            await asyncio.sleep(1)
            current_time = time.time()
        print(f'{clock()} {name}_{total_time}:', 'Done.')

    async def main():
        # get_running_loop() get the running event loop in the current OS thread
        # out to __main__ thread
        started.loop = asyncio.get_running_loop()
        started.queue = task_queue = asyncio.Queue()
        started.set()
        while True:
            item = await task_queue.get()
            if item == _sentinel:
                # task_done is used to tell join when the work in the queue is 
                # actually finished. A queue length of zero does not mean work
                # is complete.
                task_queue.task_done()
                break
            task = asyncio.create_task(process(*item))
            # Add a callback to be run when the Task is done.
            # Indicate that a formerly enqueued task is complete. Used by queue 
            # consumer threads. For each get() used to fetch a task, a 
            # subsequent call to task_done() tells the queue that the processing
            # on the task is complete.
            task.add_done_callback(lambda _: task_queue.task_done())            

        # keep loop going until all the work has completed
        # When the count of unfinished tasks drops to zero, join() unblocks.
        await task_queue.join()

    print('event loop start')
    asyncio.run(main())
    print('event loop end')

if __name__ == '__main__':
    # started Event is used for communication with thread th
    started = threading.Event()
    th = threading.Thread(target=do_it, args=(started,))
    th.start()
    # started.wait() blocks until started.set(), ensuring that the tasks and
    # loop variables are available from the event loop thread
    started.wait()
    tasks, loop = started.queue, started.loop

    # call_soon schedules the callback callback to be called with args arguments
    # at the next iteration of the event loop.
    # call_soon_threadsafe is required to schedule callbacks from another thread 
    
    # put_nowait enqueues items in non-blocking fashion, == put(block=False)
    loop.call_soon_threadsafe(tasks.put_nowait, ('abc', 5))
    loop.call_soon_threadsafe(tasks.put_nowait, ('def', 3))
    loop.call_soon_threadsafe(tasks.put_nowait, 'STOP')

正如其他人指出的那样,您的代码的问题在于它使用了一个阻塞队列,该队列在等待下一项时停止事件循环。 然而,所提出的解决方案的问题在于它引入了延迟,因为它必须偶尔休眠以允许其他任务运行。 除了引入延迟之外,它还可以防止程序进入睡眠状态,即使队列中没有项目也是如此。

另一种方法是切换到专为与 asyncio 一起使用而设计的asyncio 队列 这个队列必须在运行循环内创建,所以你不能把它传递给do_it ,你必须检索它。 此外,由于它是一个 asyncio 原语,它的put方法必须通过call_soon_threadsafe调用以确保事件循环注意到它。

最后一个问题是您的main()函数使用另一个繁忙循环来等待所有任务完成。 这可以通过使用Queue.join来避免,它是专门为此用例设计的。

这是您的代码经过调整以包含上述所有建议, process功能与您的原始代码保持不变:

import asyncio
import time
import threading

def do_it(started):
    '''Process tasks in the queue until the sentinel value is received'''
    _sentinel = 'STOP'

    def clock():
        return time.strftime("%X")

    async def process(name, total_time):
        status = f'{clock()} {name}_{total_time}:'
        print(status, 'START')
        current_time = time.time()
        end_time = current_time + total_time
        while current_time < end_time:
            print(status, 'processing...')
            await asyncio.sleep(1)
            current_time = time.time()
        print(status, 'DONE.')

    async def main():
        started.loop = asyncio.get_running_loop()
        started.queue = task_queue = asyncio.Queue()
        started.set()
        while True:
            item = await task_queue.get()
            if item == _sentinel:
                task_queue.task_done()
                break
            task = asyncio.create_task(process(*item))
            task.add_done_callback(lambda _: task_queue.task_done())
        await task_queue.join()

    print('event loop start')
    asyncio.run(main())
    print('event loop end')

if __name__ == '__main__':
    started = threading.Event()
    th = threading.Thread(target=do_it, args=(started,))
    th.start()
    started.wait()
    tasks, loop = started.queue, started.loop

    loop.call_soon_threadsafe(tasks.put_nowait, ('abc', 5))
    loop.call_soon_threadsafe(tasks.put_nowait, ('def', 3))
    loop.call_soon_threadsafe(tasks.put_nowait, 'STOP')

注意:与您的代码无关的问题是它等待create_task()的结果,这使create_task()的用处无效,因为它不允许在后台运行。 (这相当于立即加入您刚开始的线程 - 您可以这样做,但没有多大意义。)此问题在上述代码和您对问题的编辑中均已解决。

您的代码有两个问题。

首先,你不应该在asyncio.create_task之前await 这可能是导致您的代码同步运行的原因。

然后,一旦您使代码异步运行,您需要在main的 while 循环之后执行一些操作,以便代码不会立即返回,而是等待所有作业完成。 另一个stackoverflow 答案建议:

while len(asyncio.Task.all_tasks()) > 1:  # Any task besides main() itself?
    await asyncio.sleep(0.2)

或者,有一些版本的Queue可以跟踪正在运行的任务。

作为一个额外的问题:

如果 queue.Queue 为空,则默认情况下 get() 会阻塞并且不返回标记字符串。 https://docs.python.org/3/library/queue.html

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM