[英]Read space separated text file in pandas
I am trying to read a text file present in this url into a pandas dataframe.我正在尝试将 url 中存在的文本文件读取到 pandas dataframe 中。 https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/TU_Stundenwerte_Beschreibung_Stationen.txt
https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/TU_Stundenwerte_Beschreibung_Stationen.txt
It has uneven spacing between columns.它的列之间的间距不均匀。 I have tried sep='\s+', delim_whitespace=True but none of these are working.
我试过sep='\s+', delim_whitespace=True但这些都不起作用。 Please suggest a way to read this text file into pandas dataframe.
请建议一种将此文本文件读入 pandas dataframe 的方法。
The read_fwf function in pandas can read a file with a table of fixed-width formatted lines into a DataFrame. pandas中的 read_fwf function 可以将具有固定宽度格式行表的文件读取到 DataFrame 中。
The header line confuses the auto-width calculations so best to skip the header lines and explicitly add the column names so in this case the argument skiprows=2
is added. header 行混淆了自动宽度计算,因此最好跳过 header 行并显式添加列名,因此在这种情况下添加参数
skiprows=2
。
import pandas as pd
url ='https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/recent/TU_Stundenwerte_Beschreibung_Stationen.txt'
df = pd.read_fwf(url, encoding="ansi", skiprows=2,
names=['Stations_id', 'von_datum', 'bis_datum', 'Stationshoehe',
'geoBreite', 'geoLaenge', 'Stationsname', 'Bundesland'])
print(df)
Output: Output:
Stations_id von_datum bis_datum Stationshoehe geoBreite geoLaenge Stationsname Bundesland
0 3 19500401 20110331 202 50.7827 6.0941 Aachen Nordrhein-Westfalen
1 44 20070401 20220920 44 52.9336 8.2370 Großenkneten Niedersachsen
2 52 19760101 19880101 46 53.6623 10.1990 Ahrensburg-Wulfsdorf Schleswig-Holstein
3 71 20091201 20191231 759 48.2156 8.9784 Albstadt-Badkap Baden-Württemberg
4 73 20070401 20220920 340 48.6159 13.0506 Aldersbach-Kriestorf Bayern
.. ... ... ... ... ... ... ... ...
663 19171 20200901 20220920 13 54.0038 9.8553 Hasenkrug-Hardebek Schleswig-Holstein
664 19172 20200901 20220920 48 54.0246 9.3880 Wacken Schleswig-Holstein
[665 rows x 8 columns]
If want to load the file locally and open it then just change the url to the local file name.如果要在本地加载文件并打开它,只需将 url 更改为本地文件名即可。
df = pd.read_fwf('TU_Stundenwerte_Beschreibung_Stationen.txt', encoding="ansi", skiprows=2,
names=['Stations_id', 'von_datum', 'bis_datum', 'Stationshoehe',
'geoBreite', 'geoLaenge', 'Stationsname', 'Bundesland'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.