Errors when loading .csv file using pandas in python

Question

I have a large sized csv file, approximately 6gb, and it's taking a lot of time to load on to python. I get the following error:

import pandas as pd
df = pd.read_csv('nyc311.csv', low_memory=False)


Python(1284,0x7fffa37773c0) malloc: *** mach_vm_map(size=18446744071562067968) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in _read
    data = parser.read()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 939, in read
    ret = self._engine.read(nrows)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 1508, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 851, in pandas.parser.TextReader.read (pandas/parser.c:10438)
  File "pandas/parser.pyx", line 939, in pandas.parser.TextReader._read_rows (pandas/parser.c:11607)
  File "pandas/parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas/parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory

I don't think I am understanding the error code, the last line seems to suggest that the file is too big to load? I also tried low_memory=FALSE option but this did not work either.

I'm not sure what " can't allocate region" mean, could it be possible that the header includes 'region' and pandas cannot locate the column underneath?

Answer 1

Out of memory issue occur due to RAM. There's no other explaination for that.

Sum of all data memory-overheads for in-RAM objects !< RAM

malloc: *** mach_vm_map(size=18446744071562067968) failed You can clearly understand from this error statement.

Try using.

df = pd.read_csv('nyc311.csv',chunksize =5000,lineterminator='\r')

Or, if reading this csv is only a part of your program, and if there are any other dataframes created before,try cleaning them if not in use.

import gc
del old_df              #clear dataframes not in use
gc.collect()        # collect Garbage 
del gc.garbage[:]   # Clears RAM

`

Errors when loading .csv file using pandas in python

Question

1 answers

solution1
1 ACCPTED 2017-02-08 05:52:52

Errors when loading .csv file using pandas in python

Question

1 answers

solution1 1 ACCPTED 2017-02-08 05:52:52

solution1
1 ACCPTED 2017-02-08 05:52:52