简体   繁体   中英

Errors when loading .csv file using pandas in python

I have a large sized csv file, approximately 6gb, and it's taking a lot of time to load on to python. I get the following error:

import pandas as pd
df = pd.read_csv('nyc311.csv', low_memory=False)


Python(1284,0x7fffa37773c0) malloc: *** mach_vm_map(size=18446744071562067968) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in _read
    data = parser.read()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 939, in read
    ret = self._engine.read(nrows)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 1508, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 851, in pandas.parser.TextReader.read (pandas/parser.c:10438)
  File "pandas/parser.pyx", line 939, in pandas.parser.TextReader._read_rows (pandas/parser.c:11607)
  File "pandas/parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas/parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory

I don't think I am understanding the error code, the last line seems to suggest that the file is too big to load? I also tried low_memory=FALSE option but this did not work either.

I'm not sure what " can't allocate region" mean, could it be possible that the header includes 'region' and pandas cannot locate the column underneath?

Out of memory issue occur due to RAM. There's no other explaination for that.

Sum of all data memory-overheads for in-RAM objects !< RAM

malloc: *** mach_vm_map(size=18446744071562067968) failed You can clearly understand from this error statement.

Try using.

df = pd.read_csv('nyc311.csv',chunksize =5000,lineterminator='\r')

Or, if reading this csv is only a part of your program, and if there are any other dataframes created before,try cleaning them if not in use.

import gc
del old_df              #clear dataframes not in use
gc.collect()        # collect Garbage 
del gc.garbage[:]   # Clears RAM

`

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM