[英]How to run Python async in while loop independently
I use FASTAPI
and fastapi_utils
package.我使用
FASTAPI
和fastapi_utils
包。 My api is to receive users' texts within 3 s and they will be sent to the model to calculate their length (just for a simple demo).我的api是在3秒内接收用户的文本并将它们发送到模型以计算它们的长度(仅用于简单的演示)。 So I use the fastapi_utils as a schedule background task.
所以我使用 fastapi_utils 作为调度后台任务。 Finally, I will get the result from the dict.
最后,我将从字典中得到结果。 But I found that the program is blocked at the while loop and
feed_data_into_model
doesn't put the value to the shared_dict.但是我发现程序在 while 循环中被阻塞并且
feed_data_into_model
没有将值放入 shared_dict。 So the while will not end.所以一段时间不会结束。
import asyncio
import uuid
import logging
from typing import Union, List
import threading
lock = threading.Lock()
from fastapi import FastAPI, Request, Body
from fastapi_utils.tasks import repeat_every
import uvicorn
logger = logging.getLogger(__name__)
app = FastAPI()
queue = asyncio.Queue(maxsize=64)
shared_dict = {} # model result saved here!
lock = threading.Lock()
def handle_dict(key, value = None, action = "put"):
lock.acquire()
try:
if action == "put":
shared_dict[key] = value
elif action == "delete":
del shared_dict[key]
elif action == "get":
value = shared_dict[key]
elif action == "exist":
value = key in shared_dict
else:
pass
finally:
# Always called, even if exception is raised in try block
lock.release()
return value
def model_work(x:Union[str,List[str]]):
if isinstance(x,str):
result = [len(x)]
else:
result = [len(_) for _ in x]
return result
@app.on_event("startup")
@repeat_every(seconds=4, logger=logger, wait_first=True)
async def feed_data_into_model():
if queue.qsize() != 0:
data = []
ids = []
while queue.qsize() != 0:
task = await queue.get()
task_id = task[0]
ids.append(task_id)
text = task[1]
data.append(text)
result = model_work(data)
# print("model result:",result)
for index,task_id in enumerate(ids):
value = result[index]
handle_dict(task_id,value,action = "put")
async def get_response(task_id):
not_exist_flag = True
while not_exist_flag:
not_exist_flag = handle_dict(task_id, None, action= "exist") is False # BUG: it doesn't work
value = handle_dict(task_id, None, action= "get")
handle_dict(task_id, None, action= "delete")
return value
@app.get("/{text}")
async def demo(text:str):
task_id = str(uuid.uuid4())
state = "pending"
item= [task_id,text,state,""]
await queue.put(item)
# !: await query_from_answer_dict
value = await get_response(task_id)
return value
if __name__ == "__main__":
# !: single process run every 4s, if queue not empty then pop them out to model
# !: and model will save result in thread-safe dict, key is task-id
uvicorn.run("api:app", host="0.0.0.0", port=5555)
After the service run, you should access the web API with text.服务运行后,您应该使用文本访问 Web API。 And you will find you are blocked even after 3 seconds.
而且您会发现即使在 3 秒后您也被阻止了。 I guess that fastapi_utils doesn't open a new thread to do background task so the main thread is blocked in a while loop since the dict is always empty.
我猜 fastapi_utils 不会打开一个新线程来执行后台任务,因此主线程在 while 循环中被阻塞,因为 dict 始终为空。
The problem at the moment is the use of blocking code in an asyncio loop.目前的问题是在 asyncio 循环中使用阻塞代码。 If you insert a short delay it will work:
如果您插入一个短暂的延迟,它将起作用:
while not_exist_flag:
not_exist_flag = handle_dict(task_id, None, action="exist") is False
await asyncio.sleep(0.1)
The reason is that you need to let the scheduler go elsewhere and actually do the processing!原因是你需要让调度器去别的地方,实际去做处理! Asyncio is not a free pass to write blocking code, sadly.
遗憾的是,Asyncio 并不是编写阻塞代码的免费通行证。 But adding a delay is a very non-optimal solution.*
但是添加延迟是一个非常非最佳的解决方案。*
A better solution would be to have your get_response
funtion await the task directly, since currently everything is in one thread, and there is no advantage to handing processing over to a separate queue.更好的解决方案是让您的
get_response
函数直接等待任务,因为目前一切都在一个线程中,将处理移交给单独的队列没有任何优势。 Or use multiprocessing, and submit the task whilst keeping a local reference to it.或者使用多处理,并在保留对它的本地引用的同时提交任务。 Then you can await the future directly, and avoid using polling.
那么你可以直接等待未来,避免使用轮询。
By time you've done this you've nearly reinvented celery.当您完成此操作时,您几乎已经彻底改造了 celery。 The fastapi project generator includes celery by default: if you really need to hand these tasks off to another process, you might want to look at doing that.
fastapi 项目生成器默认包含 celery:如果您确实需要将这些任务交给另一个进程,您可能需要考虑这样做。
In general, try to avoid polling in asyncio.一般来说,尽量避免在 asyncio 中进行轮询。 You want to await everything.
你想等待一切。
*It's non-optimal because: *这是非最佳的,因为:
Note that your polling loop could have been written:请注意,您的轮询循环可能已编写:
while not handle_dict(task_id, None, action="exist"):
pass
Which shows up the busy loop more clearly.这更清楚地显示了繁忙的循环。
the server code, need to remove while sleep in get-response because it's ugly :服务器代码,需要在 get-response 睡眠时删除,因为它很难看:
import asyncio
import uuid
from typing import Union, List
import threading
from queue import Queue
from fastapi import FastAPI, Request, Body, APIRouter
from fastapi_utils.tasks import repeat_every
import uvicorn
import time
import logging
import datetime
logger = logging.getLogger(__name__)
app = APIRouter()
def feed_data_into_model(queue,shared_dict,lock):
if queue.qsize() != 0:
data = []
ids = []
while queue.qsize() != 0:
task = queue.get()
task_id = task[0]
ids.append(task_id)
text = task[1]
data.append(text)
result = model_work(data)
# print("model result:",result)
for index,task_id in enumerate(ids):
value = result[index]
handle_dict(task_id,value,action = "put",lock=lock, shared_dict = shared_dict)
class TestThreading(object):
def __init__(self, interval, queue,shared_dict,lock):
self.interval = interval
thread = threading.Thread(target=self.run, args=(queue,shared_dict,lock))
thread.daemon = True
thread.start()
def run(self,queue,shared_dict,lock):
while True:
# More statements comes here
# print(datetime.datetime.now().__str__() + ' : Start task in the background')
feed_data_into_model(queue,shared_dict,lock)
time.sleep(self.interval)
if __name__ != "__main__":
# since uvicorn will init and reload the file, and __name__ will change, not as __main__, so I init variable here
# otherwise, we will have 2 background thread (one is empty) , it doesn't run but hard to debug due to the confusion
global queue, shared_dict, lock
queue = Queue(maxsize=64) #
shared_dict = {} # model result saved here!
lock = threading.Lock()
tr = TestThreading(3, queue,shared_dict,lock)
def handle_dict(key, value = None, action = "put", lock = None, shared_dict = None):
lock.acquire()
try:
if action == "put":
shared_dict[key] = value
elif action == "delete":
del shared_dict[key]
elif action == "get":
value = shared_dict[key]
elif action == "exist":
value = key in shared_dict
else:
pass
finally:
# Always called, even if exception is raised in try block
lock.release()
return value
def model_work(x:Union[str,List[str]]):
time.sleep(3)
if isinstance(x,str):
result = [len(x)]
else:
result = [len(_) for _ in x]
return result
async def get_response(task_id, lock, shared_dict):
not_exist_flag = True
while not_exist_flag:
not_exist_flag = handle_dict(task_id, None, action= "exist",lock=lock, shared_dict = shared_dict) is False
await asyncio.sleep(0.02)
value = handle_dict(task_id, None, action= "get", lock=lock, shared_dict = shared_dict)
handle_dict(task_id, None, action= "delete",lock=lock, shared_dict = shared_dict)
return value
@app.get("/{text}")
async def demo(text:str):
global queue, shared_dict, lock
task_id = str(uuid.uuid4())
logger.info(task_id)
state = "pending"
item= [task_id,text,state,""]
queue.put(item)
# TODO: await query_from_answer_dict , need to change since it's ugly to while wait the answer
value = await get_response(task_id, lock, shared_dict)
return 1
if __name__ == "__main__":
# what I want to do:
# single process run every 3s, if queue not empty then pop them out to model
# and model will save result in thread-safe dict, key is task-id
uvicorn.run("api:app", host="0.0.0.0", port=5555)
the client test code:客户端测试代码:
for n in {1..5}; do curl http://localhost:5555/a & ; done
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.