Ignore delimiters at end of row in Pandas read csv

Question

I have data in CSV files. I am separating the data into columns using a single tab character. Most of the rows just contain one tab character, like this:

A\tB

Some rows contain extra tabs at the end of the row, like this:

A\tB\t\t

Hence, if I do pd.read_csv(filePath, sep='\t') , then I get an error: ParserError: Error tokenizing data. c error: Expected 2 fields in line XXX, saw 4 ParserError: Error tokenizing data. c error: Expected 2 fields in line XXX, saw 4 . That's because some rows contain 4 tabs.

So how can I ignore the tabs at the end of a row, if it contains extra tabs?

Answer 1

Use io.StringIO to clean file before:

import pandas as pd
import io

with open('data.txt') as table:
    buffer = io.StringIO('\n'.join(line.strip() for line in table))
    df = pd.read_table(buffer, header=None)

Output:

>>> df
   0  1
0  A  B
1  A  B

Ignore delimiters at end of row in Pandas read csv

Question

1 answers

solution1
0 2021-12-08 13:16:34

Ignore delimiters at end of row in Pandas read csv

Question

1 answers

solution1 0 2021-12-08 13:16:34

solution1
0 2021-12-08 13:16:34