简体   繁体   English

FastAPI UploadFile 比 Flask 慢

[英]FastAPI UploadFile is slow compared to Flask

I have created an endpoint, as shown below:我已经创建了一个端点,如下所示:

@app.post("/report/upload")
def create_upload_files(files: UploadFile = File(...)):
        try:
            with open(files.filename,'wb+') as wf:
                wf.write(file.file.read())
                wf.close()
        except Exception as e:
            return {"error": e.__str__()}

It is launched with uvicorn:它与 uvicorn 一起启动:

../venv/bin/uvicorn test_upload:app --host=0.0.0.0 --port=5000 --reload

I am performing some tests of uploading a file of around 100 MB using Python requests, and takes around 128 seconds:我正在执行一些使用 Python 请求上传大约100 MB文件的测试,大约需要 128 秒:

f = open(sys.argv[1],"rb").read()
hex_convert = binascii.hexlify(f)
items = {"files": hex_convert.decode()}
start = time.time()
r = requests.post("http://192.168.0.90:5000/report/upload",files=items)
end = time.time() - start
print(end)

I tested the same upload script with an API endpoint using Flask and takes around 0.5 seconds:我使用 Flask 使用 API 端点测试了相同的上传脚本,大约需要 0.5 秒:

from flask import Flask, render_template, request
app = Flask(__name__)


@app.route('/uploader', methods = ['GET', 'POST'])
def upload_file():
   if request.method == 'POST':
      f = request.files['file']
      f.save(f.filename)
      return 'file uploaded successfully'

if __name__ == '__main__':
    app.run(host="192.168.0.90",port=9000)

Is there anything I am doing wrong?有什么我做错了吗?

You can define the endpoint with async , and since allUploadFile methods are async methods, you need to await them.您可以使用async定义端点,并且由于所有UploadFile方法都是async方法,因此您需要await它们。 You can write the file(s) using synchronous writing, as shown in this answer , or ( better ) using asynchronous writing with aiofiles , as shown below:您可以使用同步写入来写入文件,如this answer所示,或者(更好)使用aiofiles异步写入,如下所示:

Upload Single File上传单个文件

app.py应用程序.py

from fastapi import File, UploadFile
import aiofiles

@app.post("/upload")
async def upload(file: UploadFile = File(...)):
    try:
        contents = await file.read()
        async with aiofiles.open(file.filename, 'wb') as f:
            await f.write(contents)
    except Exception:
        return {"message": "There was an error uploading the file"}
    finally:
        await file.close()

    return {"message": f"Successfuly uploaded {file.filename}"}

or async in the chunked manner, to avoid loading the entire file into memory .以分块方式异步,以避免将整个文件加载到 memory中。 Though, this one takes much longer to complete (depending on the chunk size you choose).不过,这需要更长的时间才能完成(取决于您选择的块大小)。

from fastapi import File, UploadFile
import aiofiles

@app.post("/upload")
async def upload(file: UploadFile = File(...)):
    try:
        async with aiofiles.open(file.filename, 'wb') as f:
            while contents := await file.read(1024): # async read chunk
                await f.write(contents)
    except Exception:
        return {"message": "There was an error uploading the file"}
    finally:
        await file.close()

    return {"message": f"Successfuly uploaded {file.filename}"}

test.py测试.py

import requests

url = 'http://127.0.0.1:8000/upload'
file = {'file': open('images/1.png', 'rb')}
resp = requests.post(url=url, files=file) 
print(resp.json())

Upload Multiple Files上传多个文件

app.py应用程序.py

from fastapi import File, UploadFile
import aiofiles

@app.post("/upload")
async def upload(files: List[UploadFile] = File(...)):
    for file in files:
        try:
            contents = await file.read()
            async with aiofiles.open(file.filename, 'wb') as f:
                await f.write(contents)
        except Exception:
            return {"message": "There was an error uploading the file(s)"}
        finally:
            await file.close()

    return {"message": f"Successfuly uploaded {[file.filename for file in files]}"}  

test.py测试.py

import requests

url = 'http://127.0.0.1:8000/upload'
files = [('files', open('images/1.png', 'rb')), ('files', open('images/2.png', 'rb'))]
resp = requests.post(url=url, files=files) 
print(resp.json())

Update更新

Digging into the source code, it seems that the latest versions of Starlette (which FastAPI uses underneath) use a SpooledTemporaryFile (for UploadFile data structure) with max_size attribute set to 1 MB (1024 * 1024 bytes) - see here - in contrast to older versions where max_size was set to the default value, ie, 0 bytes, such as the one here .深入研究源代码,似乎最新版本的Starlette (FastAPI 在下面使用)使用了一个SpooledTemporaryFile (用于UploadFile数据结构),其max_size属性设置为1 MB (1024 * 1024 字节) - 见这里- 与旧版本相比max_size设置为默认值的版本,即 0 字节,例如此处的那个。

The above means, in the past, data used to be fully loaded into memory no matter the size of file (which could lead to issues when a file couldn't fit into RAM), whereas, in the latest version, data is spooled in memory until the file size exceeds max_size (ie, 1 MB), at which point the contents are written to disk;上面的意思是,过去,无论文件大小如何,数据都被完全加载到 memory 中(当文件无法放入 RAM 时可能会导致问题),而在最新版本中,数据被假脱机memory 直到file大小超过max_size (即1 MB),此时将内容写入磁盘; more specifically, to the OS's temporary directory (Note: this also means that the maximum size of file you can upload is bound by the storage available to the system's temporary directory. . If enough storage (for your needs) is available on your system, there's nothing to worry about; otherwise, please have a look at this answer on how to change the default temporary directory).更具体地说,到操作系统的临时目录(注意:这也意味着您可以上传的文件的最大大小受系统临时目录可用的存储空间的限制。如果您的系统上有足够的存储空间(满足您的需要),没有什么可担心的;否则,请查看有关如何更改默认临时目录的答案)。 Thus, the process of writing the file multiple times - that is, initially loading the data into RAM, then, if the data exceeds 1 MB in size, writing the file to temporary directory, then reading the file from temporary directory (using file.read() ) and finally, writing the file to a permanent directory - is what makes uploading file slow compared to using Flask framework, as OP noted in their question (though, the difference in time is not that big, but just a few seconds, depending on the size of file).因此,多次写入文件的过程 - 即首先将数据加载到 RAM 中,然后,如果数据大小超过 1 MB,则将文件写入临时目录,然后从临时目录中读取文件(使用file.read() )最后,将文件写入永久目录 - 与使用 Flask 框架相比,上传文件速度较慢,正如 OP 在他们的问题中指出的那样(尽管时间差异不是那么大,但只有几秒钟,取决于文件的大小)。

Solution解决方案

The solution (if one needs to upload files quite larger than 1 MB and uploading time is important to them) would be to access the request body as a stream.解决方案(如果需要上传大于 1 MB 的文件并且上传时间对他们很重要)将作为 stream 访问request正文。 As per Starlette documentation , if you access .stream() , then the byte chunks are provided without storing the entire body to memory (and later to temporary directory, if the body contains file data that exceeds 1 MB).根据Starlette 文档,如果您访问.stream() ,则提供字节块而不将整个主体存储到 memory (如果主体包含超过 1 MB 的文件数据,则稍后存储到临时目录)。 Example is given below, where time of uploading is recorded on client side, and which ends up being the same as when using Flask framework with the example given in OP's question.下面给出了示例,其中上传时间记录在客户端,最终与使用 Flask 框架以及 OP 问题中给出的示例时相同。

app.py应用程序.py

from fastapi import Request
import aiofiles

@app.post('/upload')
async def upload(request: Request):
    try:
        filename = request.headers['filename']
        async with aiofiles.open(filename, 'wb') as f:
            async for chunk in request.stream():
                await f.write(chunk)
    except Exception:
        return {"message": "There was an error uploading the file"}
     
    return {"message": f"Successfuly uploaded {filename}"}

In case your application does not require saving the file to disk, and all you need is the file to be loaded directly into memory, you can just use the below (make sure your RAM has enough space available to accommodate the accumulated data):如果您的应用程序不需要将文件保存到磁盘,而您只需要将文件直接加载到 memory 中,则可以使用以下内容(确保您的 RAM 有足够的空间来容纳累积的数据):

from fastapi import Request

@app.post('/upload')
async def upload(request: Request):
    body = b''
    try:
        filename = request.headers['filename']
        async for chunk in request.stream():
            body += chunk
    except Exception:
        return {"message": "There was an error uploading the file"}
    
    #print(body.decode())
    return {"message": f"Successfuly uploaded {filename}"}

test.py测试.py

import requests
import time

with open("images/1.png", "rb") as f:
    data = f.read()
   
url = 'http://127.0.0.1:8000/upload'
headers = {'filename': '1.png'}

start = time.time()
resp = requests.post(url=url, data=data, headers=headers)
end = time.time() - start

print(f'Elapsed time is {end} seconds.', '\n')
print(resp.json())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM