简体   繁体   English

如何使用 FastAPI stream DataFrame 而不将数据保存到 csv 文件?

[英]How to stream DataFrame using FastAPI without saving the data to csv file?

I would like to know how can I stream a DataFrame using FastAPI without having to save the DataFrame to csv file on disk.我想知道如何使用 FastAPI 将 stream 和 DataFrame 保存到文件中的 DataFrame 到 Z6287CB5675FF528BFE33 上。 Currently, what I managed to do is to stream data from csv file, but the speed was not very fast compared to returning a FileResponse .目前,我设法做的是 csv 文件中的 stream 数据,但与返回FileResponse相比,速度不是很快。 The /option7 below is what im trying to do.下面的/option7是我想要做的。

My goal is to stream data from FastAPI backend without saving the DataFrame as a csv file.我的目标是从 FastAPI 后端获取 stream 数据,而不将 DataFrame 保存为 csv 文件。

Thank you.谢谢你。

from fastapi import FastAPI, Response,Query
from fastapi.responses import FileResponse,HTMLResponse,StreamingResponse
app = FastAPI()

df = pd.read_csv("data.csv")

@app.get("/option4")
def load_questions():
    return FileResponse(path="C:Downloads/data.csv", filename="data.csv")

@app.get("/option5")
def load_questions():
    def iterfile():  # 
        with open('data.csv', mode="rb") as file_like:  # 
            yield from file_like  # 

    return StreamingResponse(iterfile(), media_type="text/csv")

@app.get("/option7")
def load_questions():
    def iterfile():  # 
        #with open(df, mode="rb") as file_like:  # 
        yield from df  # 

    return StreamingResponse(iterfile(), media_type="application/json")


As mentioned in this answer , as well as here and here , when the entire data (or DataFrame in your case) is already loaded into memory, using StreamingResponse makes little sense.本答案以及此处此处所述,当整个数据(或您的情况下为DataFrame )已加载到 memory 中时,使用StreamingResponse几乎没有意义。 StreamingResponse makes sense when you want to transfer real-time data and when you don't know the size of your output ahead of time, and you don't want to wait to collect it all to find out before you start sending it to the client, as well as when a file that you would like to return is too large to fit into memory—for instance, if you have 8GB of RAM, you can't load a 50GB file—and hence, you would rather load the file into memory in chunks.当您想要传输实时数据并且您提前不知道 output 的大小时, StreamingResponse很有意义,并且您不想在开始将数据发送到客户端,以及当您要返回的文件太大而无法放入内存时(例如,如果您有 8GB 的 RAM,则无法加载 50GB 的文件),因此,您宁愿加载文件成块进入 memory。

In your case, you should instead return a customResponse directly, after using Panda's .to_json() method to convert the DataFrame into a JSON string, as described in this answer .在您的情况下,您应该在使用 Panda 的.to_json()方法将DataFrame转换为 JSON 字符串之后直接返回自定义Response ,如本答案中所述。 Example:例子:

from fastapi import Response

@app.get("/questions")
def load_questions():
    return Response(df.to_json(orient="records"), media_type="application/json")

If you find the browser taking a while to display the data, you may want to have the data downloaded as a .json file to the user's device (which would be completed much faster), rather than waiting for the browser to display a large amount of data.如果您发现浏览器需要一段时间来显示数据,您可能希望将数据作为.json文件下载到用户的设备(完成速度会更快),而不是等待浏览器显示大量数据数据的。 You can do that by setting the Content-Disposition header to the Response using the attachment parameter (see this answer for more details):您可以通过使用attachment参数将Content-Disposition header 设置为Response来做到这一点(有关更多详细信息,请参阅此答案):

@app.get("/questions")
def load_questions():
    headers = {'Content-Disposition': 'attachment; filename="data.json"'}
    return Response(df.to_json(orient="records"), headers=headers, media_type='application/json')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM