简体   繁体   中英

Read and write large CSV file in python

I use the following code to read a LARGE CSV file (6-10 GB), insert a header text, and then export it to CSV a again.

df = read_csv('read file')
df.columns =['list of headers']
df.to_csv('outfile',index=False,quoting=csv.QUOTE_NONNUMERIC)

But this methodology is extremely slow and I run out of memory. Any suggestions?

sorry I don't have enough reputation to comment, so I'll leave an answer. first, would you try to add low_memory parameter when you read the file? ( https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html )

df = read_csv('read file', low_memory=False)

second, how about try to check the memory usage using info()

df = read_csv('read file')
df.columns =['list of headers']
print(df.info())

third, based on Mohit's suggestion,

# set chunk size to split the big file per chunk size when read it in memory
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
    #do process with chunk as your file content

Rather than reading in the whole 6GB file, could you not just add the headers to a new file, and then cat in the rest? Something like this:

import fileinput

columns = ['list of headers']
columns.to_csv('outfile.csv',index=False,quoting=csv.QUOTE_NONNUMERIC)
with FileInput(files=('infile.csv')) as f:
    for line in f:
        outfile.write(line)
    outfile.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM