Error tokenizing data. C error: out of memory pandas python, large file csv

Question

I have a large csv file of 3.5 go and I want to read it using pandas.

This is my code:

import pandas as pd
tp = pd.read_csv('train_2011_2012_2013.csv', sep=';', iterator=True, chunksize=20000000, low_memory = False)
df = pd.concat(tp, ignore_index=True)

I get this error:

pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:8771)()

pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)()

pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)()

pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:23325)()

CParserError: Error tokenizing data. C error: out of

The capacity of my ram is 8 Go.

Answer 1

try this bro:

mylist = []

for chunk in  pd.read_csv('train_2011_2012_2013.csv', sep=';', chunksize=20000):
    mylist.append(chunk)

big_data = pd.concat(mylist, axis= 0)
del mylist

Answer 2

You may try setting error_bad_lines = False when calling the csv file ie

import pandas as pd
df = pd.read_csv('my_big_file.csv', error_bad_lines = False)

Answer 3

This error could also be caused by the chunksize=20000000 . Decreasing that fixed the issue in my case. In ℕʘʘḆḽḘ's solution chunksize is also decreased which might have done the trick.

Answer 4

You may try to add parameter engine='python . It loads the data slower but it helped in my situation.

Error tokenizing data. C error: out of memory pandas python, large file csv

Question

4 answers

solution1
20 2016-12-23 14:44:12

solution2
2 2017-10-25 17:12:29

solution3
2 2019-03-05 07:52:27

solution4
0 2020-10-12 22:17:19

Error tokenizing data. C error: out of memory pandas python, large file csv

Question

4 answers

solution1 20 2016-12-23 14:44:12

solution2 2 2017-10-25 17:12:29

solution3 2 2019-03-05 07:52:27

solution4 0 2020-10-12 22:17:19

solution1
20 2016-12-23 14:44:12

solution2
2 2017-10-25 17:12:29

solution3
2 2019-03-05 07:52:27

solution4
0 2020-10-12 22:17:19