I have a FastAPI endpoint where it need to download some files from HDFS to the local server.
I'm trying to use asyncio to run the function that will download the files in a separate process.
I'm using FastAPI Depends to create a HDFS client and inject the object in the endpoint execution.
from fastapi import Depends, FastAPI, Request, Response, status
from hdfs import InsecureClient
import asyncio
from concurrent.futures.process import ProcessPoolExecutor
app = FastAPI()
HDFS_URLS = ['http://hdfs-srv.local:50070']
async def run_in_process(fn, *args):
loop = asyncio.get_event_loop()
return await loop.run_in_executor(app.state.executor, fn, *args) # wait and return result
def connectHDFS():
client = InsecureClient(url)
yield client
def fr(id, img, client):
# my code here
client.download(id_identifica_foto_dir_hdfs, id_identifica_foto_dir_local, True, n_threads=2)
# my code here
return jsonReturn
@app.post("/")
async def main(request: Request, hdfsclient: InsecureClient = Depends(connectHDFS)):
# Decode the received message
data = await request.json()
message = base64.b64decode(data['data']).decode('utf-8').replace("'", '"')
message = json.loads(message)
res = await run_in_process(fr, message['id'], message['img'], hdfsclient)
return {
"message": res
}
@app.on_event("startup")
async def on_startup():
app.state.executor = ProcessPoolExecutor()
@app.on_event("shutdown")
async def on_shutdown():
app.state.executor.shutdown()
But I'm not able to pass ahead the hdfsclient
object:
res = await run_in_process(fr, message['id'], message['img'], hdfsclient)
I'm getting the following error:
Traceback (most recent call last):
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/uvicorn/protocols/http/h11_impl.py", line 396, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
return await self.app(scope, receive, send)
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/fastapi/applications.py", line 199, in __call__
await super().__call__(scope, receive, send)
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/starlette/applications.py", line 111, in __call__
await self.middleware_stack(scope, receive, send)
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in __call__
raise exc from None
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in __call__
await self.app(scope, receive, _send)
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in __call__
raise exc from None
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in __call__
await self.app(scope, receive, sender)
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/starlette/routing.py", line 566, in __call__
await route.handle(scope, receive, send)
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/starlette/routing.py", line 227, in handle
await self.app(scope, receive, send)
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/starlette/routing.py", line 41, in app
response = await func(request)
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/fastapi/routing.py", line 202, in app
dependant=dependant, values=values, is_coroutine=is_coroutine
File "/home/kleyson/.virtualenvs/reconhecimentofacial/lib/python3.7/site-packages/fastapi/routing.py", line 148, in run_endpoint_function
return await dependant.call(**values)
File "./asgi.py", line 86, in main
res = await run_in_process(fr, message['id'], message['img'], hdfsclient)
File "./asgi.py", line 22, in run_in_process
return await loop.run_in_executor(app.state.executor, fn, *args) # wait and return result
File "/usr/lib/python3.7/multiprocessing/queues.py", line 236, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.lock objects
How can have the hdfsclient
available inside the def fr()
function without the need to create a new connection on every new request? I mean, how to create the hdfsclient
on the application startup and be able to use it inside the function?
The entire point of asyncio
is to do what you are trying to achieve in the same process.
The typical example is a web crawler, where you open multiple requests, within the same thread/process, and then wait for them to finish. This way, you will get data from multiple urls
without having to wait each single request before starting the next one.
The same applies in your case: call your async
function that downloads the file, do your stuff and then wait for the file download to complete (if it hasn't completed yet). Sharing data between processes is not trivial, and your function is not working because of that reason.
I suggest you to first understand what async
is and how it works, before jumping into doing something that you don't understand.
Some tutorials on asyncio
https://www.datacamp.com/community/tutorials/asyncio-introduction
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.