I tried importing a csv file of size 4GB using pd.read_csv
but received out of memory error. Then tried with dask.dataframe
, but couldn't convert to pandas dataframe
( same memory error).
import pandas as pd
import dask.dataframe as dd
df = dd.read_csv(#file)
df = df.compute()
Then tried to use the chunksize
parameter, but same memory error:
import pandas as pd
df = pd.read_csv(#file, chunksize=1000000, low_memory=False)
df = pd.concat(df)
Also tried using chunksize
with lists, same error:
import pandas as pd
list = []
for chunk in pd.read_csv(#file, chunksize=1000000, low_memory=False)
list.append(chunk)
df = pd.concat(list)
Attempts:
chunksize
(2000 or 50000) - failed (memory error for 4GB file) Please let me know how to proceed further?
I use python 3.7 and RAM 8GB.
I also tried the Attempt 3 in a server with RAM 128GB, but still
memory error
I cannot assign
dtype
as the csv file to be imported can contain different columns at different time
Already been answered here: How to read a 6 GB csv file with pandas
I also tried the above method with a 2GB file and it works.
Also try to keep the chunk size even smaller.
Can you share the configuration of your system as well, that would be quite useful
I just want to record what I tried after getting enough suggestion! Thanks to Robin Nemeth and juanpa
As juanpa pointed I was able to read the csv file (4GB) in the server with 128GB RAM when I used 64bit python
executable file
As Robin pointed out even with a 64bit executable I'm not able to read the csv file (4GB) in my local machine with 8GB RAM .
So, no matter what we try the machine's RAM matters as dataframe uses in memory
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.