[英]Errors when loading .csv file using pandas in python
我有一个大的csv文件,大约6gb,要花很多时间才能加载到python。 我收到以下错误:
import pandas as pd
df = pd.read_csv('nyc311.csv', low_memory=False)
Python(1284,0x7fffa37773c0) malloc: *** mach_vm_map(size=18446744071562067968) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in _read
data = parser.read()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 939, in read
ret = self._engine.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 1508, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 851, in pandas.parser.TextReader.read (pandas/parser.c:10438)
File "pandas/parser.pyx", line 939, in pandas.parser.TextReader._read_rows (pandas/parser.c:11607)
File "pandas/parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas/parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory
我不认为我正在理解错误代码,最后一行似乎表明文件太大而无法加载? 我也尝试过low_memory=FALSE
选项,但这也不起作用。
我不确定“无法分配区域”是什么意思,标题是否可能包含“区域”而熊猫无法在其下方找到该列?
由于RAM,发生内存不足问题。 没有其他解释。
RAM中对象的所有数据存储器开销总和!<RAM
malloc: *** mach_vm_map(size=18446744071562067968) failed
您可以从此错误声明中清楚地了解。
尝试使用。
df = pd.read_csv('nyc311.csv',chunksize =5000,lineterminator='\r')
或者,如果读取此csv只是程序的一部分,并且之前创建了其他数据框,请尝试在不使用它们的情况下对其进行清理。
import gc
del old_df #clear dataframes not in use
gc.collect() # collect Garbage
del gc.garbage[:] # Clears RAM
`
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.