Read and write large CSV file in python

Question

I use the following code to read a LARGE CSV file (6-10 GB), insert a header text, and then export it to CSV a again.

df = read_csv('read file')
df.columns =['list of headers']
df.to_csv('outfile',index=False,quoting=csv.QUOTE_NONNUMERIC)

But this methodology is extremely slow and I run out of memory. Any suggestions?

Answer 1

sorry I don't have enough reputation to comment, so I'll leave an answer. first, would you try to add low_memory parameter when you read the file? ( https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html )

df = read_csv('read file', low_memory=False)

second, how about try to check the memory usage using info()

df = read_csv('read file')
df.columns =['list of headers']
print(df.info())

third, based on Mohit's suggestion,

# set chunk size to split the big file per chunk size when read it in memory
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
    #do process with chunk as your file content

Answer 2

Rather than reading in the whole 6GB file, could you not just add the headers to a new file, and then cat in the rest? Something like this:

import fileinput

columns = ['list of headers']
columns.to_csv('outfile.csv',index=False,quoting=csv.QUOTE_NONNUMERIC)
with FileInput(files=('infile.csv')) as f:
    for line in f:
        outfile.write(line)
    outfile.close()

Read and write large CSV file in python

Question

2 answers

solution1
0 2018-11-22 14:23:08

solution2
0 2018-11-22 14:40:17

Read and write large CSV file in python

Question

2 answers

solution1 0 2018-11-22 14:23:08

solution2 0 2018-11-22 14:40:17

solution1
0 2018-11-22 14:23:08

solution2
0 2018-11-22 14:40:17