简体   繁体   中英

How to stream DataFrame using FastAPI without saving the data to csv file?

I would like to know how can I stream a DataFrame using FastAPI without having to save the DataFrame to csv file on disk. Currently, what I managed to do is to stream data from csv file, but the speed was not very fast compared to returning a FileResponse . The /option7 below is what im trying to do.

My goal is to stream data from FastAPI backend without saving the DataFrame as a csv file.

Thank you.

from fastapi import FastAPI, Response,Query
from fastapi.responses import FileResponse,HTMLResponse,StreamingResponse
app = FastAPI()

df = pd.read_csv("data.csv")

@app.get("/option4")
def load_questions():
    return FileResponse(path="C:Downloads/data.csv", filename="data.csv")

@app.get("/option5")
def load_questions():
    def iterfile():  # 
        with open('data.csv', mode="rb") as file_like:  # 
            yield from file_like  # 

    return StreamingResponse(iterfile(), media_type="text/csv")

@app.get("/option7")
def load_questions():
    def iterfile():  # 
        #with open(df, mode="rb") as file_like:  # 
        yield from df  # 

    return StreamingResponse(iterfile(), media_type="application/json")


As mentioned in this answer , as well as here and here , when the entire data (or DataFrame in your case) is already loaded into memory, using StreamingResponse makes little sense. StreamingResponse makes sense when you want to transfer real-time data and when you don't know the size of your output ahead of time, and you don't want to wait to collect it all to find out before you start sending it to the client, as well as when a file that you would like to return is too large to fit into memory—for instance, if you have 8GB of RAM, you can't load a 50GB file—and hence, you would rather load the file into memory in chunks.

In your case, you should instead return a customResponse directly, after using Panda's .to_json() method to convert the DataFrame into a JSON string, as described in this answer . Example:

from fastapi import Response

@app.get("/questions")
def load_questions():
    return Response(df.to_json(orient="records"), media_type="application/json")

If you find the browser taking a while to display the data, you may want to have the data downloaded as a .json file to the user's device (which would be completed much faster), rather than waiting for the browser to display a large amount of data. You can do that by setting the Content-Disposition header to the Response using the attachment parameter (see this answer for more details):

@app.get("/questions")
def load_questions():
    headers = {'Content-Disposition': 'attachment; filename="data.json"'}
    return Response(df.to_json(orient="records"), headers=headers, media_type='application/json')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM