[英]Scrape data from webpage with BeautifulSoup - How to append data to existing dataframe?
[英]How to use beautifulsoup and pandas to scrape data from a dataframe with a date filter?
我是 python 的新手,我正在尋找從網站上抓取數據。 問題是它有一個日期過濾器,我正在努力尋找如何提取多個日期。 在這方面有什么好的資源嗎,或者有人對如何做到這一點有建議嗎? 我似乎無法在網上找到我需要的東西。
我的代碼提取了今天顯示的內容:
res = requests.get("https://www.inmo.ie/Trolley_Ward_Watch")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
print(df[0].to_json(orient='records'))
數據通過 Javascript 加載。 但是您可以使用requests
庫模擬 AJAX,例如(將DateTrolley
參數更改為所需的日期):
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.inmo.ie/Trolley_Ward_Watch'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
form_url = 'https://www.inmo.ie' + soup.form['action']
data = {'DateTrolley': '01/05/2020'} # <-- change it to eg. 05/05/2020 to get other date
soup = BeautifulSoup(requests.post(form_url, data=data).content, 'html.parser')
df = pd.read_html(str(soup.table))
print(df)
印刷:
[ Date Hospital Region Trolley Total Ward Total Total
0 01/05/2020 Beaumont Hospital Eastern 0 0 0
1 01/05/2020 Connolly Hospital, Blanchardstown Eastern 0 0 0
2 01/05/2020 Connolly Hospital, Blanchardstown Eastern 0 0 0
3 01/05/2020 Mater Misericordiae University Hospital Eastern 0 0 0
4 01/05/2020 Naas General Hospital Eastern 0 0 0
5 01/05/2020 St James' Hospital Eastern 2 0 2
6 01/05/2020 St Vincent's University Hospital Eastern 0 0 0
7 01/05/2020 Tallaght University Hospital Eastern 1 0 1
8 01/05/2020 Bantry General Hospital Country 0 0 0
9 01/05/2020 Cavan General Hospital Country 2 0 2
10 01/05/2020 Cork University Hospital Country 2 0 2
11 01/05/2020 Letterkenny University Hospital Country 0 0 0
12 01/05/2020 Mayo University Hospital Country 0 0 0
13 01/05/2020 Mercy University Hospital, Cork Country 0 0 0
14 01/05/2020 Mid Western Regional Hospital, Ennis Country 0 0 0
15 01/05/2020 Midland Regional Hospital, Mullingar Country 1 0 1
16 01/05/2020 Midland Regional Hospital, Portlaoise Country 0 0 0
17 01/05/2020 Midland Regional Hospital, Tullamore Country 0 0 0
18 01/05/2020 Nenagh General Hospital Country 0 1 1
19 01/05/2020 Our Lady of Lourdes Hospital, Drogheda Country 0 0 0
20 01/05/2020 Our Lady's Hospital, Navan Country 0 0 0
21 01/05/2020 Portiuncula University Hospital Country 0 0 0
22 01/05/2020 Sligo University Hospital Country 0 0 0
23 01/05/2020 South Tipperary General Hospital Country 0 0 0
24 01/05/2020 St Lukes Hospital, Kilkenny Country 0 0 0
25 01/05/2020 University College Hospital Galway Country 0 0 0
26 01/05/2020 University Hospital Kerry Country 0 0 0
27 01/05/2020 University Hospital Waterford Country 0 0 0
28 01/05/2020 University Hospital, Limerick Country 8 0 8
29 01/05/2020 Wexford General Hospital Country 1 0 1]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.