I have a large csv file of 3.5 go and I want to read it using pandas.
This is my code:
import pandas as pd
tp = pd.read_csv('train_2011_2012_2013.csv', sep=';', iterator=True, chunksize=20000000, low_memory = False)
df = pd.concat(tp, ignore_index=True)
I get this error:
pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:8771)()
pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)()
pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)()
pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:23325)()
CParserError: Error tokenizing data. C error: out of
The capacity of my ram is 8 Go.
try this bro:
mylist = []
for chunk in pd.read_csv('train_2011_2012_2013.csv', sep=';', chunksize=20000):
mylist.append(chunk)
big_data = pd.concat(mylist, axis= 0)
del mylist
You may try setting error_bad_lines = False when calling the csv file ie
import pandas as pd
df = pd.read_csv('my_big_file.csv', error_bad_lines = False)
This error could also be caused by the chunksize=20000000 . Decreasing that fixed the issue in my case. In ℕʘʘḆḽḘ's solution chunksize is also decreased which might have done the trick.
You may try to add parameter engine='python
. It loads the data slower but it helped in my situation.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.