简体   繁体   中英

How to extract and save in .csv chunks of data from a large .csv file iteratively using Python?

I am new to Python and I attempt to read a large .csv file (with hundreds of thousands or possibly few millions of rows; and about 15.000 columns) using pandas.

What I thought I could do is to create and save each chunk in a new .csv file, iteratively across all chunks. I am currently using a lap top with relatively limited memory (of about 4 Gb, in the process of upgrading it) but I was wondering whether I could do this without changing my set up now. Alternatively, I could transfer this process in a pc with large RAM and attempt larger chunks, but I wanted to get this in place even for shorter row chunks.

I have seen that I can process quickly chunks of data (eg 10.000 rows and all columns), using the code below. But due to me being a Python beginner, I have only managed to order the first chunk. I would like to loop iteratively across chunks and save them.

import pandas as pd
import os

print(os.getcwd())
print(os.listdir(os.getcwd()))

chunksize = 10000

data = pd.read_csv('ukb35190.csv', chunksize=chunksize)

df = data.get_chunk(chunksize)

print(df)

export_csv1 = df.to_csv (r'/home/user/PycharmProjects/PROJECT/export_csv_1.csv', index = None, header=True)

If you are not doing any processing on data then you dont have to even store it in any variable.You can do it directly. PFA code below.Hope this would help u.

import pandas as pd
import os

chunksize = 10000
batch=1

for chunk in pd.read_csv(r'ukb35190.csv',chunksize=chunk_size):
  chunk.to_csv(r'ukb35190.csv'+str(batch_no)+'.csv',index=False)
  batch_no+=1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM