A text file looked like this. I want to covert it into a CSV file.
With Pandas, when I used:
df = pd.read_fwf(f)
It looks like:
It seems there are tab and space used for delimiters, I changed the line to:
df = pd.read_csv('Water level.txt' , sep = '[" "|\t]', encoding='GBK', engine = 'python')
But it warns:
pandas.errors.ParserError: Expected 14 fields in line 4, saw 16. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
What's the right way with Python to convert it into a CSV file?
Try passing in the column widths if the data structure doesn't change. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_fwf.html There's other options here as well with read_fwf
.
Verify the widths argument is correct:
pd.read_fwf('JcP65rQY5F2Y.txt', widths=[5,10,9,2,5])
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
0 09:25 7.54 288 17 NaN
1 09:30 7.55 20 6 NaN
2 09:30 7.55 7 2 East
3 09:30 7.55 11 3 East
4 09:30 7.56 5 4 West
.. ... ... ... ... ...
194 09:59 7.60 3 1 East
195 09:59 7.60 9 4 East
196 09:59 7.60 8 1 West
197 09:59 7.60 51 3 West
198 09:59 7.59 20 15 East
[199 rows x 5 columns]
Your regex needs a tweak, `r"[ \t]+" selects any length of spaces and tabs (1 or greater). Additionally, pandas uses the first line of the file to determine how many columns there are. Your example starts with 4 columns and then adds another later on. That's too late - pandas has already created 4 element rows. You can solve that by supplying your own column names, letting pandas know how many there really are. In this example I'm just using integers but you could give them more useful names.
df = pd.read_csv('Water level.txt' , sep=r'[ \t]', encoding='GBK',
engine='python', names=range(5))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.