简体   繁体   English

如何从 pandas 数据帧中的 web 页面读取所有 csv 文件?

[英]How to read all csv files from web page in a pandas data frame?

I'm trying to read all.csv files from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports to a data frame.我正在尝试从https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports中读取 all.csv 文件到数据框。

My code so far:到目前为止我的代码:

url = 'https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports'
x = requests.get(url).text
filenames = re.findall('[\d]{1,2}-[\d]{1,2}-[\d]{4}.csv', x)
frame = pd.concat(pd.read_csv(url + y) for y in filenames) 

Maybe somebody can help:D也许有人可以帮忙:D

Change the URL to将 URL 更改为

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/'

and it should work.它应该可以工作。 This gives you access to the raw csv file and not to a page the csv is on.这使您可以访问原始 csv 文件,而不是访问 csv 所在的页面。

Edit: Just noticed that you need your old url to get the filenames:编辑:刚刚注意到您需要旧的 url 来获取文件名:

url_raw = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/'
url = 'https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports'
x = requests.get(url).text
filenames = re.findall('[\d]{1,2}-[\d]{1,2}-[\d]{4}.csv', x)
frame = pd.concat(pd.read_csv(url_raw + y) for y in filenames)

Another option is to run with the following code另一种选择是使用以下代码运行

frame = pd.concat(pd.read_csv(f'{url}/{y}') for y in filenames) frame = pd.concat(pd.read_csv(f'{url}/{y}') for y in filenames)

Just as an additional note, you might not get the expected behaviour from pd.concat as the csv files in the url given are inconsistent column-wise (see examples below).作为附加说明,您可能无法从 pd.concat 获得预期的行为,因为给出的 url 中的 csv 文件在列方面不一致(参见下面的示例)。 You might want to rename or strip some of the columns before concat.您可能希望在 concat 之前重命名或删除某些列。

01-27-2020.csv
Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
03-01-2020.csv
Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered,Latitude,Longitude
04-26-2020.csv
FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
06-28-2020.csv
FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incidence_Rate,Case-Fatality_Ratio

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM