简体   繁体   English

在python中使用熊猫加载.csv文件时出错

[英]Errors when loading .csv file using pandas in python

I have a large sized csv file, approximately 6gb, and it's taking a lot of time to load on to python. 我有一个大的csv文件,大约6gb,要花很多时间才能加载到python。 I get the following error: 我收到以下错误:

import pandas as pd
df = pd.read_csv('nyc311.csv', low_memory=False)


Python(1284,0x7fffa37773c0) malloc: *** mach_vm_map(size=18446744071562067968) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in _read
    data = parser.read()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 939, in read
    ret = self._engine.read(nrows)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 1508, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 851, in pandas.parser.TextReader.read (pandas/parser.c:10438)
  File "pandas/parser.pyx", line 939, in pandas.parser.TextReader._read_rows (pandas/parser.c:11607)
  File "pandas/parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas/parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: out of memory

I don't think I am understanding the error code, the last line seems to suggest that the file is too big to load? 我不认为我正在理解错误代码,最后一行似乎表明文件太大而无法加载? I also tried low_memory=FALSE option but this did not work either. 我也尝试过low_memory=FALSE选项,但这也不起作用。

I'm not sure what " can't allocate region" mean, could it be possible that the header includes 'region' and pandas cannot locate the column underneath? 我不确定“无法分配区域”是什么意思,标题是否可能包含“区域”而熊猫无法在其下方找到该列?

Out of memory issue occur due to RAM. 由于RAM,发生内存不足问题。 There's no other explaination for that. 没有其他解释。

Sum of all data memory-overheads for in-RAM objects !< RAM RAM中对象的所有数据存储器开销总和!<RAM

malloc: *** mach_vm_map(size=18446744071562067968) failed You can clearly understand from this error statement. malloc: *** mach_vm_map(size=18446744071562067968) failed您可以从此错误声明中清楚地了解。

Try using. 尝试使用。

df = pd.read_csv('nyc311.csv',chunksize =5000,lineterminator='\r')

Or, if reading this csv is only a part of your program, and if there are any other dataframes created before,try cleaning them if not in use. 或者,如果读取此csv只是程序的一部分,并且之前创建了其他数据框,请尝试在不使用它们的情况下对其进行清理。

import gc
del old_df              #clear dataframes not in use
gc.collect()        # collect Garbage 
del gc.garbage[:]   # Clears RAM

` `

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM