Pandas - memory error while importing a CSV file of size 4GB

Question

I tried importing a csv file of size 4GB using pd.read_csv but received out of memory error. Then tried with dask.dataframe , but couldn't convert to pandas dataframe ( same memory error).

import pandas as pd
import dask.dataframe as dd
df = dd.read_csv(#file)
df = df.compute()

Then tried to use the chunksize parameter, but same memory error:

import pandas as pd
df = pd.read_csv(#file, chunksize=1000000, low_memory=False)
df = pd.concat(df)

Also tried using chunksize with lists, same error:

import pandas as pd
list = []
for chunk in pd.read_csv(#file, chunksize=1000000, low_memory=False)
    list.append(chunk)
df = pd.concat(list)

Attempts:

Tried with file size 1.5GB - successfully imported
Tried with file size 4GB - failed (memory error)
Tried with low chunksize (2000 or 50000) - failed (memory error for 4GB file)

Please let me know how to proceed further?

I use python 3.7 and RAM 8GB.

I also tried the Attempt 3 in a server with RAM 128GB, but still memory error

I cannot assign dtype as the csv file to be imported can contain different columns at different time

Answer 1

Already been answered here: How to read a 6 GB csv file with pandas

I also tried the above method with a 2GB file and it works.

Also try to keep the chunk size even smaller.

Can you share the configuration of your system as well, that would be quite useful

Answer 2

I just want to record what I tried after getting enough suggestion! Thanks to Robin Nemeth and juanpa

As juanpa pointed I was able to read the csv file (4GB) in the server with 128GB RAM when I used 64bit python executable file
As Robin pointed out even with a 64bit executable I'm not able to read the csv file (4GB) in my local machine with 8GB RAM .

So, no matter what we try the machine's RAM matters as dataframe uses in memory

Pandas - memory error while importing a CSV file of size 4GB

Question

2 answers

solution1
0 2019-06-04 12:51:28

solution2
0 2019-06-05 07:50:06

Pandas - memory error while importing a CSV file of size 4GB

Question

2 answers

solution1 0 2019-06-04 12:51:28

solution2 0 2019-06-05 07:50:06

solution1
0 2019-06-04 12:51:28

solution2
0 2019-06-05 07:50:06