简体   繁体   中英

Splitting Large CSV file into multiple sheets in a single Excel using Python

I am using this piece of code for reading a csv(around 1 GB) using pandas and then writing into multiple excel sheets using chunksize.

with pd.ExcelWriter('/tmp/output.xlsx',engine='xlsxwriter') as writer:
        reader = pd.read_csv(f'/tmp/{file_name}', sep=',', chunksize=1000000)
        for idx, chunk in enumerate(reader):
            chunk.to_excel(writer, sheet_name=f"Report (P_{idx + 1})", index=False)
        writer.save()

This approach is taking a lot of time.Can anyone please suggest any approaches to reduce this time?

Some days ago i have faced same problem so i tried with

you can use library called as vaex [1]: https://vaex.readthedocs.io/en/latest/

Or if you to to do itself with pandas try to use apache pyspark

Or use can use Google colud with 1200 credit

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM