简体   繁体   中英

Ignore delimiters at end of row in Pandas read csv

I have data in CSV files. I am separating the data into columns using a single tab character. Most of the rows just contain one tab character, like this:

A\tB

Some rows contain extra tabs at the end of the row, like this:

A\tB\t\t

Hence, if I do pd.read_csv(filePath, sep='\t') , then I get an error: ParserError: Error tokenizing data. c error: Expected 2 fields in line XXX, saw 4 ParserError: Error tokenizing data. c error: Expected 2 fields in line XXX, saw 4 . That's because some rows contain 4 tabs.

So how can I ignore the tabs at the end of a row, if it contains extra tabs?

Use io.StringIO to clean file before:

import pandas as pd
import io

with open('data.txt') as table:
    buffer = io.StringIO('\n'.join(line.strip() for line in table))
    df = pd.read_table(buffer, header=None)

Output:

>>> df
   0  1
0  A  B
1  A  B

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM