Convert text file containing multiple delimiters to CSV

Question

A text file looked like this. I want to covert it into a CSV file.

With Pandas, when I used:

df = pd.read_fwf(f)

It looks like:

It seems there are tab and space used for delimiters, I changed the line to:

df = pd.read_csv('Water level.txt' ,  sep = '[" "|\t]', encoding='GBK', engine = 'python')

But it warns:

pandas.errors.ParserError: Expected 14 fields in line 4, saw 16. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

What's the right way with Python to convert it into a CSV file?

Answer 1

Try passing in the column widths if the data structure doesn't change. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_fwf.html There's other options here as well with read_fwf .

Verify the widths argument is correct:

pd.read_fwf('JcP65rQY5F2Y.txt', widths=[5,10,9,2,5])


    Unnamed: 0  Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4
0        09:25        7.54         288          17        NaN
1        09:30        7.55          20           6        NaN
2        09:30        7.55           7           2       East
3        09:30        7.55          11           3       East
4        09:30        7.56           5           4       West
..         ...         ...         ...         ...        ...
194      09:59        7.60           3           1       East
195      09:59        7.60           9           4       East
196      09:59        7.60           8           1       West
197      09:59        7.60          51           3       West
198      09:59        7.59          20          15       East

[199 rows x 5 columns]

Answer 2

Your regex needs a tweak, `r"[ \t]+" selects any length of spaces and tabs (1 or greater). Additionally, pandas uses the first line of the file to determine how many columns there are. Your example starts with 4 columns and then adds another later on. That's too late - pandas has already created 4 element rows. You can solve that by supplying your own column names, letting pandas know how many there really are. In this example I'm just using integers but you could give them more useful names.

df = pd.read_csv('Water level.txt' ,  sep=r'[ \t]', encoding='GBK',
   engine='python', names=range(5))

Convert text file containing multiple delimiters to CSV

Question

2 answers

solution1
1 2021-01-04 19:28:07

solution2
1 ACCPTED 2021-01-05 01:32:25

Convert text file containing multiple delimiters to CSV

Question

2 answers

solution1 1 2021-01-04 19:28:07

solution2 1 ACCPTED 2021-01-05 01:32:25

solution1
1 2021-01-04 19:28:07

solution2
1 ACCPTED 2021-01-05 01:32:25