I am using this piece of code for reading a csv(around 1 GB) using pandas and then writing into multiple excel sheets using chunksize.
with pd.ExcelWriter('/tmp/output.xlsx',engine='xlsxwriter') as writer:
reader = pd.read_csv(f'/tmp/{file_name}', sep=',', chunksize=1000000)
for idx, chunk in enumerate(reader):
chunk.to_excel(writer, sheet_name=f"Report (P_{idx + 1})", index=False)
writer.save()
This approach is taking a lot of time.Can anyone please suggest any approaches to reduce this time?
Some days ago i have faced same problem so i tried with
you can use library called as vaex [1]: https://vaex.readthedocs.io/en/latest/
Or if you to to do itself with pandas try to use apache pyspark
Or use can use Google colud with 1200 credit
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.