繁体   English   中英

如何在while循环中独立运行Python异步

[英]How to run Python async in while loop independently

我使用FASTAPIfastapi_utils包。 我的api是在3秒内接收用户的文本并将它们发送到模型以计算它们的长度(仅用于简单的演示)。 所以我使用 fastapi_utils 作为调度后台任务。 最后,我将从字典中得到结果。 但是我发现程序在 while 循环中被阻塞并且feed_data_into_model没有将值放入 shared_dict。 所以一段时间不会结束。


import asyncio
import uuid
import logging
from typing import Union, List
import threading
lock = threading.Lock()
from fastapi import FastAPI, Request, Body
from fastapi_utils.tasks import repeat_every
import uvicorn
logger = logging.getLogger(__name__)
app = FastAPI()
queue = asyncio.Queue(maxsize=64)

shared_dict = {} # model result saved here!

lock = threading.Lock()

def handle_dict(key, value = None, action = "put"):
    lock.acquire()
    try:
        if action == "put":
            shared_dict[key] = value
        elif action == "delete":
            del shared_dict[key]
        elif action == "get":
            value = shared_dict[key]
        elif action == "exist":
            value = key in shared_dict
        else:
            pass
    finally:
        # Always called, even if exception is raised in try block
        lock.release()
    return value

def model_work(x:Union[str,List[str]]):
    if isinstance(x,str):
        result = [len(x)]
    else:
        result = [len(_) for _ in x]
    return result

@app.on_event("startup")
@repeat_every(seconds=4, logger=logger, wait_first=True)
async def feed_data_into_model():
    if queue.qsize() != 0:
        data = []
        ids = []
        while queue.qsize() != 0:
          task = await queue.get()
          task_id = task[0]
          ids.append(task_id)
          text = task[1]
          data.append(text)
        result = model_work(data)  
        # print("model result:",result)
        for index,task_id in enumerate(ids):
            value = result[index]
            handle_dict(task_id,value,action = "put")

async def get_response(task_id):
    not_exist_flag = True
    while not_exist_flag:
        not_exist_flag = handle_dict(task_id, None, action= "exist") is False # BUG: it doesn't work
    value = handle_dict(task_id, None, action= "get")
    handle_dict(task_id, None, action= "delete")
    return value

@app.get("/{text}")
async def demo(text:str):
    task_id = str(uuid.uuid4())
    state = "pending"
    item= [task_id,text,state,""]
    await queue.put(item)
    # !: await query_from_answer_dict
    value = await get_response(task_id)
    return value

if __name__ == "__main__":
    # !: single process run every 4s, if queue not empty then pop them out to model
    # !: and model will save result in thread-safe dict, key is task-id
    uvicorn.run("api:app", host="0.0.0.0", port=5555)

服务运行后,您应该使用文本访问 Web API。 而且您会发现即使在 3 秒后您也被阻止了。 我猜 fastapi_utils 不会打开一个新线程来执行后台任务,因此主线程在 while 循环中被阻塞,因为 dict 始终为空。

目前的问题是在 asyncio 循环中使用阻塞代码。 如果您插入一个短暂的延迟,它将起作用:

    while not_exist_flag:
        not_exist_flag = handle_dict(task_id, None, action="exist") is False
        await asyncio.sleep(0.1)

原因是你需要让调度器去别的地方,实际去做处理! 遗憾的是,Asyncio 并不是编写阻塞代码的免费通行证。 但是添加延迟是一个非常非最佳的解决方案。*

更好的解决方案是让您的get_response函数直接等待任务,因为目前一切都在一个线程中,将处理移交给单独的队列没有任何优势。 或者使用多处理,并在保留对它的本地引用的同时提交任务。 那么你可以直接等待未来,避免使用轮询。

当您完成此操作时,您几乎已经彻底改造了 celery。 fastapi 项目生成器默认包含 celery:如果您确实需要将这些任务交给另一个进程,您可能需要考虑这样做。

一般来说,尽量避免在 asyncio 中进行轮询。 你想等待一切。

*这是非最佳的,因为:

  • 轮询是在最高级别进行的,所以它已经比在 c 中进行的要慢
  • 这里的轮询涉及调用获取锁的整个函数,因此我们有上下文切换成本(来自函数)锁成本,以及其他任何尝试使用锁的阻塞
  • 您的轮询间隔直接影响其他代码运行的可用时间

请注意,您的轮询循环可能已编写:

while not handle_dict(task_id, None, action="exist"):
    pass

这更清楚地显示了繁忙的循环。

服务器代码,需要在 get-response 睡眠时删除,因为它很难看:


import asyncio
import uuid
from typing import Union, List
import threading
from queue import Queue
from fastapi import FastAPI, Request, Body, APIRouter
from fastapi_utils.tasks import repeat_every
import uvicorn
import time
import logging
import datetime
logger = logging.getLogger(__name__)

app = APIRouter()
def feed_data_into_model(queue,shared_dict,lock): 
    if queue.qsize() != 0:
        data = []
        ids = []
        while queue.qsize() != 0:
          task = queue.get()
          task_id = task[0]
          ids.append(task_id)
          text = task[1]
          data.append(text)
        result = model_work(data)  
        # print("model result:",result)
        for index,task_id in enumerate(ids):
            value = result[index]
            handle_dict(task_id,value,action = "put",lock=lock, shared_dict = shared_dict)

class TestThreading(object):
    def __init__(self, interval, queue,shared_dict,lock):
        self.interval = interval

        thread = threading.Thread(target=self.run, args=(queue,shared_dict,lock))
        thread.daemon = True
        thread.start()

    def run(self,queue,shared_dict,lock):
        while True:
            # More statements comes here
            # print(datetime.datetime.now().__str__() + ' : Start task in the background')
            feed_data_into_model(queue,shared_dict,lock)
            time.sleep(self.interval)

if __name__ != "__main__":
    # since uvicorn will init and reload the file, and __name__ will change, not as __main__, so I init variable here
    # otherwise, we will have 2 background thread (one is empty) , it doesn't run but hard to debug due to the confusion
    global queue, shared_dict, lock 
    queue = Queue(maxsize=64) #
    shared_dict = {} # model result saved here!
    lock = threading.Lock()
    tr = TestThreading(3, queue,shared_dict,lock)

def handle_dict(key, value = None, action = "put", lock = None, shared_dict = None):
    lock.acquire()
    try:
        if action == "put":
            shared_dict[key] = value
        elif action == "delete":
            del shared_dict[key]
        elif action == "get":
            value = shared_dict[key]
        elif action == "exist":
            value = key in shared_dict
        else:
            pass
    finally:
        # Always called, even if exception is raised in try block
        lock.release()
    return value

def model_work(x:Union[str,List[str]]):
    time.sleep(3)
    if isinstance(x,str):
        result = [len(x)]
    else:
        result = [len(_) for _ in x]
    return result

async def get_response(task_id, lock, shared_dict):
    not_exist_flag = True
    while not_exist_flag:
        not_exist_flag = handle_dict(task_id, None, action= "exist",lock=lock, shared_dict = shared_dict) is False 
        await asyncio.sleep(0.02)
    value = handle_dict(task_id, None, action= "get", lock=lock, shared_dict = shared_dict)
    handle_dict(task_id, None, action= "delete",lock=lock, shared_dict = shared_dict)
    return value

@app.get("/{text}")
async def demo(text:str):
    global queue, shared_dict, lock 
    task_id = str(uuid.uuid4())
    logger.info(task_id)
    state = "pending"
    item= [task_id,text,state,""]
    queue.put(item)
    # TODO: await query_from_answer_dict , need to change since it's ugly to while wait the answer
    value = await get_response(task_id, lock, shared_dict)
    return 1

if __name__ == "__main__":
    # what I want to do:
    #  single process run every 3s, if queue not empty then pop them out to model
    #  and model will save result in thread-safe dict, key is task-id
    
    uvicorn.run("api:app", host="0.0.0.0", port=5555)

客户端测试代码:

for n in {1..5}; do curl http://localhost:5555/a & ; done

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM