简体   繁体   中英

Pandas - memory error while importing a CSV file of size 4GB

I tried importing a csv file of size 4GB using pd.read_csv but received out of memory error. Then tried with dask.dataframe , but couldn't convert to pandas dataframe ( same memory error).

import pandas as pd
import dask.dataframe as dd
df = dd.read_csv(#file)
df = df.compute()

Then tried to use the chunksize parameter, but same memory error:

import pandas as pd
df = pd.read_csv(#file, chunksize=1000000, low_memory=False)
df = pd.concat(df)

Also tried using chunksize with lists, same error:

import pandas as pd
list = []
for chunk in pd.read_csv(#file, chunksize=1000000, low_memory=False)
    list.append(chunk)
df = pd.concat(list)

Attempts:

  1. Tried with file size 1.5GB - successfully imported
  2. Tried with file size 4GB - failed (memory error)
  3. Tried with low chunksize (2000 or 50000) - failed (memory error for 4GB file)

Please let me know how to proceed further?

I use python 3.7 and RAM 8GB.

I also tried the Attempt 3 in a server with RAM 128GB, but still memory error

I cannot assign dtype as the csv file to be imported can contain different columns at different time

Already been answered here: How to read a 6 GB csv file with pandas

I also tried the above method with a 2GB file and it works.

Also try to keep the chunk size even smaller.

Can you share the configuration of your system as well, that would be quite useful

I just want to record what I tried after getting enough suggestion! Thanks to Robin Nemeth and juanpa

  1. As juanpa pointed I was able to read the csv file (4GB) in the server with 128GB RAM when I used 64bit python executable file

  2. As Robin pointed out even with a 64bit executable I'm not able to read the csv file (4GB) in my local machine with 8GB RAM .

So, no matter what we try the machine's RAM matters as dataframe uses in memory

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM