简体   繁体   中英

Convert text file containing multiple delimiters to CSV

A text file looked like this. I want to covert it into a CSV file.

在此处输入图像描述

Water level.txt

With Pandas, when I used:

df = pd.read_fwf(f)

It looks like:

在此处输入图像描述

It seems there are tab and space used for delimiters, I changed the line to:

df = pd.read_csv('Water level.txt' ,  sep = '[" "|\t]', encoding='GBK', engine = 'python')

But it warns:

pandas.errors.ParserError: Expected 14 fields in line 4, saw 16. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

What's the right way with Python to convert it into a CSV file?

Try passing in the column widths if the data structure doesn't change. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_fwf.html There's other options here as well with read_fwf .

Verify the widths argument is correct:

pd.read_fwf('JcP65rQY5F2Y.txt', widths=[5,10,9,2,5])


    Unnamed: 0  Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4
0        09:25        7.54         288          17        NaN
1        09:30        7.55          20           6        NaN
2        09:30        7.55           7           2       East
3        09:30        7.55          11           3       East
4        09:30        7.56           5           4       West
..         ...         ...         ...         ...        ...
194      09:59        7.60           3           1       East
195      09:59        7.60           9           4       East
196      09:59        7.60           8           1       West
197      09:59        7.60          51           3       West
198      09:59        7.59          20          15       East

[199 rows x 5 columns]

Your regex needs a tweak, `r"[ \t]+" selects any length of spaces and tabs (1 or greater). Additionally, pandas uses the first line of the file to determine how many columns there are. Your example starts with 4 columns and then adds another later on. That's too late - pandas has already created 4 element rows. You can solve that by supplying your own column names, letting pandas know how many there really are. In this example I'm just using integers but you could give them more useful names.

df = pd.read_csv('Water level.txt' ,  sep=r'[ \t]', encoding='GBK',
   engine='python', names=range(5))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM