简体   繁体   中英

Pandas error while reading 14 GB csv file on 200 GB RAM workstation

This is my code to generate files by home id. Then I will analyze each home seperately.

import pandas as pd
data = pd.read_csv("110homes.csv")
for i in (np.unique(data['dataid'])):
    print i
    d1 = pd.DataFrame(data[data['dataid']==i])
    k = str(i)
    d1.to_csv(k + ".csv")

However, I am getting this error. The machine has 200 GB RAM and it is showing memory error too:

    data = pd.read_csv("110homes.csv")
  File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 474, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 260, in _read
    return parser.read()
  File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 721, in read
    ret = self._engine.read(nrows)
  File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 1170, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 769, in pandas.parser.TextReader.read (pandas/parser.c:7544)
  File "pandas/parser.pyx", line 819, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8137)
  File "pandas/parser.pyx", line 1833, in pandas.parser._concatenate_chunks (pandas/parser.c:22383)
MemoryError

Data in RAM can take a lot more space than on disk. Without seeing your 110homes.csv file, it's impossible to know details, but imagine that it consists of 10 floating point numbers per line, like: 0.0,1.0,2.0,... . In the CSV, each takes 3 bytes + 1 byte for the delimiter. In Python, each takes 8 bytes (on a 64 byte machine) for the float, plus 2 bytes per Unicode char (another 8 bytes), plus 8 bytes for string length, plus 8 bytes per pointer, plus bytes per row, etc.

Think about it like this: On a 64 bit machine, the minimum size for a pointer, a native int, or a native float, is 8 bytes. You need several of those per field, and several more per row. There's nothing unusual about taking 15x in RAM versus disk.

Do a simple test: Take the first 10% of the lines of your file, and monitor python via top as it processes. See how much RAM it uses. Does it use at least 20 GB?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM