[英]Python BeautifulSoup prettify
我正在嘗試 web 廢棄網頁以下載數據。 代碼如下:
from bs4 import BeautifulSoup
import requests
link='https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release'
data=[]
url=requests.get(link).text
soup=BeautifulSoup(url,'lxml')
table=soup.find_all('div',class_='abs-content clearfix text-formatted field field--name-field-abs-text-paragraph-content field--type-text-long field--label-hidden')
for data in table:
region=data.find_all('tr')
for header in region:
print(header.text)
我終於得到以下結果:
Residential property pricesSep Qtr 21 to Dec Qtr 21Dec Qtr 20 to Dec Qtr 21
% change% change
Weighted average of eight capital cities4.723.7
Sydney4.126.7
Melbourne3.920.0
Brisbane9.627.8
Adelaide6.823.9
Perth2.915.7
Hobart6.529.8
Darwin1.513.0
Canberra6.428.8
Total value of the dwelling stock in AustraliaDec Qtr 21
Value of dwelling stock(a) ($b)9,901.6
Mean price of residential dwellings ($'000)920.1
Number of residential dwellings ('000)10,761.9
RPPI (a)HPIADPI
Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21
% change% change% change
Weighted average of eight capital cities4.75.33.2
Sydney4.14.53.3
Melbourne3.94.23.1
Brisbane9.610.83.9
Adelaide6.87.73.1
Perth2.92.92.3
Hobart6.56.74.9
Darwin1.51.32.1
Canberra6.47.04.3
方法一
使用bs4
模塊先找table
import requests
from bs4 import BeautifulSoup
res=requests.get("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
soup=BeautifulSoup(res.text,"html.parser")
table=soup.find("table")
現在找到表頭和 append 到值列表
all_cols=table.find_all("th")
cols=[i.get_text(strip=True) for i in all_cols]
first_three=cols[:3]
現在找到表行值並使用,
將其拆分為 append 以列出可在 pandas df
中使用的列表值列表並提供表結構數據
lst=[]
for i in table.find("tbody").find_all("tr"):
lst.append(i.get_text(strip=True,separator=",").split(","))
filtered=lst[:-1]
import pandas as pd
df=pd.DataFrame(filtered,columns=first_three)
df
Output:
Residential property prices Sep Qtr 21 to Dec Qtr 21 Dec Qtr 20 to Dec Qtr 21
0 Weighted average of eight capital cities 4.7 23.7
1 Sydney 4.1 26.7
....
方法二
從 URL 獲取表的最簡單方法是使用pandas
模塊獲取所有表詳細信息
import pandas
lst=pd.read_html("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
df=lst[0]
Output:
Residential property prices Sep Qtr 21 to Dec Qtr 21 Dec Qtr 20 to Dec Qtr 21
Residential property prices % change % change
0 Weighted average of eight capital cities 4.7 23.7
1 Sydney 4.1 26.7
2 Melbourne 3.9 20.0
.....
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.