简体   繁体   English

如何在while循环中独立运行Python异步

[英]How to run Python async in while loop independently

I use FASTAPI and fastapi_utils package.我使用FASTAPIfastapi_utils包。 My api is to receive users' texts within 3 s and they will be sent to the model to calculate their length (just for a simple demo).我的api是在3秒内接收用户的文本并将它们发送到模型以计算它们的长度(仅用于简单的演示)。 So I use the fastapi_utils as a schedule background task.所以我使用 fastapi_utils 作为调度后台任务。 Finally, I will get the result from the dict.最后,我将从字典中得到结果。 But I found that the program is blocked at the while loop and feed_data_into_model doesn't put the value to the shared_dict.但是我发现程序在 while 循环中被阻塞并且feed_data_into_model没有将值放入 shared_dict。 So the while will not end.所以一段时间不会结束。


import asyncio
import uuid
import logging
from typing import Union, List
import threading
lock = threading.Lock()
from fastapi import FastAPI, Request, Body
from fastapi_utils.tasks import repeat_every
import uvicorn
logger = logging.getLogger(__name__)
app = FastAPI()
queue = asyncio.Queue(maxsize=64)

shared_dict = {} # model result saved here!

lock = threading.Lock()

def handle_dict(key, value = None, action = "put"):
    lock.acquire()
    try:
        if action == "put":
            shared_dict[key] = value
        elif action == "delete":
            del shared_dict[key]
        elif action == "get":
            value = shared_dict[key]
        elif action == "exist":
            value = key in shared_dict
        else:
            pass
    finally:
        # Always called, even if exception is raised in try block
        lock.release()
    return value

def model_work(x:Union[str,List[str]]):
    if isinstance(x,str):
        result = [len(x)]
    else:
        result = [len(_) for _ in x]
    return result

@app.on_event("startup")
@repeat_every(seconds=4, logger=logger, wait_first=True)
async def feed_data_into_model():
    if queue.qsize() != 0:
        data = []
        ids = []
        while queue.qsize() != 0:
          task = await queue.get()
          task_id = task[0]
          ids.append(task_id)
          text = task[1]
          data.append(text)
        result = model_work(data)  
        # print("model result:",result)
        for index,task_id in enumerate(ids):
            value = result[index]
            handle_dict(task_id,value,action = "put")

async def get_response(task_id):
    not_exist_flag = True
    while not_exist_flag:
        not_exist_flag = handle_dict(task_id, None, action= "exist") is False # BUG: it doesn't work
    value = handle_dict(task_id, None, action= "get")
    handle_dict(task_id, None, action= "delete")
    return value

@app.get("/{text}")
async def demo(text:str):
    task_id = str(uuid.uuid4())
    state = "pending"
    item= [task_id,text,state,""]
    await queue.put(item)
    # !: await query_from_answer_dict
    value = await get_response(task_id)
    return value

if __name__ == "__main__":
    # !: single process run every 4s, if queue not empty then pop them out to model
    # !: and model will save result in thread-safe dict, key is task-id
    uvicorn.run("api:app", host="0.0.0.0", port=5555)

After the service run, you should access the web API with text.服务运行后,您应该使用文本访问 Web API。 And you will find you are blocked even after 3 seconds.而且您会发现即使在 3 秒后您也被阻止了。 I guess that fastapi_utils doesn't open a new thread to do background task so the main thread is blocked in a while loop since the dict is always empty.我猜 fastapi_utils 不会打开一个新线程来执行后台任务,因此主线程在 while 循环中被阻塞,因为 dict 始终为空。

The problem at the moment is the use of blocking code in an asyncio loop.目前的问题是在 asyncio 循环中使用阻塞代码。 If you insert a short delay it will work:如果您插入一个短暂的延迟,它将起作用:

    while not_exist_flag:
        not_exist_flag = handle_dict(task_id, None, action="exist") is False
        await asyncio.sleep(0.1)

The reason is that you need to let the scheduler go elsewhere and actually do the processing!原因是你需要让调度器去别的地方,实际去做处理! Asyncio is not a free pass to write blocking code, sadly.遗憾的是,Asyncio 并不是编写阻塞代码的免费通行证。 But adding a delay is a very non-optimal solution.*但是添加延迟是一个非常非最佳的解决方案。*

A better solution would be to have your get_response funtion await the task directly, since currently everything is in one thread, and there is no advantage to handing processing over to a separate queue.更好的解决方案是让您的get_response函数直接等待任务,因为目前一切都在一个线程中,将处理移交给单独的队列没有任何优势。 Or use multiprocessing, and submit the task whilst keeping a local reference to it.或者使用多处理,并在保留对它的本地引用的同时提交任务。 Then you can await the future directly, and avoid using polling.那么你可以直接等待未来,避免使用轮询。

By time you've done this you've nearly reinvented celery.当您完成此操作时,您几乎已经彻底改造了 celery。 The fastapi project generator includes celery by default: if you really need to hand these tasks off to another process, you might want to look at doing that. fastapi 项目生成器默认包含 celery:如果您确实需要将这些任务交给另一个进程,您可能需要考虑这样做。

In general, try to avoid polling in asyncio.一般来说,尽量避免在 asyncio 中进行轮询。 You want to await everything.你想等待一切。

*It's non-optimal because: *这是非最佳的,因为:

  • polling is happening at the highest level, so it's already slower than doing it in c轮询是在最高级别进行的,所以它已经比在 c 中进行的要慢
  • polling here involves calling a whole function which acquires a lock, thus we have the context switch cost (from the function) the lock cost, and the blocking of anything else trying to use the lock这里的轮询涉及调用获取锁的整个函数,因此我们有上下文切换成本(来自函数)锁成本,以及其他任何尝试使用锁的阻塞
  • your polling interval directly effects the time available for other code to run您的轮询间隔直接影响其他代码运行的可用时间

Note that your polling loop could have been written:请注意,您的轮询循环可能已编写:

while not handle_dict(task_id, None, action="exist"):
    pass

Which shows up the busy loop more clearly.这更清楚地显示了繁忙的循环。

the server code, need to remove while sleep in get-response because it's ugly :服务器代码,需要在 get-response 睡眠时删除,因为它很难看:


import asyncio
import uuid
from typing import Union, List
import threading
from queue import Queue
from fastapi import FastAPI, Request, Body, APIRouter
from fastapi_utils.tasks import repeat_every
import uvicorn
import time
import logging
import datetime
logger = logging.getLogger(__name__)

app = APIRouter()
def feed_data_into_model(queue,shared_dict,lock): 
    if queue.qsize() != 0:
        data = []
        ids = []
        while queue.qsize() != 0:
          task = queue.get()
          task_id = task[0]
          ids.append(task_id)
          text = task[1]
          data.append(text)
        result = model_work(data)  
        # print("model result:",result)
        for index,task_id in enumerate(ids):
            value = result[index]
            handle_dict(task_id,value,action = "put",lock=lock, shared_dict = shared_dict)

class TestThreading(object):
    def __init__(self, interval, queue,shared_dict,lock):
        self.interval = interval

        thread = threading.Thread(target=self.run, args=(queue,shared_dict,lock))
        thread.daemon = True
        thread.start()

    def run(self,queue,shared_dict,lock):
        while True:
            # More statements comes here
            # print(datetime.datetime.now().__str__() + ' : Start task in the background')
            feed_data_into_model(queue,shared_dict,lock)
            time.sleep(self.interval)

if __name__ != "__main__":
    # since uvicorn will init and reload the file, and __name__ will change, not as __main__, so I init variable here
    # otherwise, we will have 2 background thread (one is empty) , it doesn't run but hard to debug due to the confusion
    global queue, shared_dict, lock 
    queue = Queue(maxsize=64) #
    shared_dict = {} # model result saved here!
    lock = threading.Lock()
    tr = TestThreading(3, queue,shared_dict,lock)

def handle_dict(key, value = None, action = "put", lock = None, shared_dict = None):
    lock.acquire()
    try:
        if action == "put":
            shared_dict[key] = value
        elif action == "delete":
            del shared_dict[key]
        elif action == "get":
            value = shared_dict[key]
        elif action == "exist":
            value = key in shared_dict
        else:
            pass
    finally:
        # Always called, even if exception is raised in try block
        lock.release()
    return value

def model_work(x:Union[str,List[str]]):
    time.sleep(3)
    if isinstance(x,str):
        result = [len(x)]
    else:
        result = [len(_) for _ in x]
    return result

async def get_response(task_id, lock, shared_dict):
    not_exist_flag = True
    while not_exist_flag:
        not_exist_flag = handle_dict(task_id, None, action= "exist",lock=lock, shared_dict = shared_dict) is False 
        await asyncio.sleep(0.02)
    value = handle_dict(task_id, None, action= "get", lock=lock, shared_dict = shared_dict)
    handle_dict(task_id, None, action= "delete",lock=lock, shared_dict = shared_dict)
    return value

@app.get("/{text}")
async def demo(text:str):
    global queue, shared_dict, lock 
    task_id = str(uuid.uuid4())
    logger.info(task_id)
    state = "pending"
    item= [task_id,text,state,""]
    queue.put(item)
    # TODO: await query_from_answer_dict , need to change since it's ugly to while wait the answer
    value = await get_response(task_id, lock, shared_dict)
    return 1

if __name__ == "__main__":
    # what I want to do:
    #  single process run every 3s, if queue not empty then pop them out to model
    #  and model will save result in thread-safe dict, key is task-id
    
    uvicorn.run("api:app", host="0.0.0.0", port=5555)

the client test code:客户端测试代码:

for n in {1..5}; do curl http://localhost:5555/a & ; done

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM