簡體   English   中英

Python BeautifulSoup 美化

[英]Python BeautifulSoup prettify

我正在嘗試 web 廢棄網頁以下載數據。 代碼如下:

from bs4 import BeautifulSoup
import requests

link='https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release'
data=[]

url=requests.get(link).text
soup=BeautifulSoup(url,'lxml')
table=soup.find_all('div',class_='abs-content clearfix text-formatted field field--name-field-abs-text-paragraph-content field--type-text-long field--label-hidden')
for data in table:
    region=data.find_all('tr')
    for header in region:
        print(header.text)

我終於得到以下結果:

Residential property pricesSep Qtr 21 to Dec Qtr 21Dec Qtr 20 to Dec Qtr 21
% change% change
Weighted average of eight capital cities4.723.7
Sydney4.126.7
Melbourne3.920.0
Brisbane9.627.8
Adelaide6.823.9
Perth2.915.7
Hobart6.529.8
Darwin1.513.0
Canberra6.428.8
   
Total value of the dwelling stock in AustraliaDec Qtr 21
Value of dwelling stock(a) ($b)9,901.6
Mean price of residential dwellings ($'000)920.1
Number of residential dwellings ('000)10,761.9
 RPPI (a)HPIADPI
Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21
% change% change% change
Weighted average of eight capital cities4.75.33.2
Sydney4.14.53.3
Melbourne3.94.23.1
Brisbane9.610.83.9
Adelaide6.87.73.1
Perth2.92.92.3
Hobart6.56.74.9
Darwin1.51.32.1
Canberra6.47.04.3

如何將它們分成不同的列並像這樣呈現以下數據? 謝謝你。 在此處輸入圖像描述

方法一

使用bs4模塊先找table

import requests
from bs4 import BeautifulSoup
res=requests.get("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
soup=BeautifulSoup(res.text,"html.parser")
table=soup.find("table")

現在找到表頭和 append 到值列表

all_cols=table.find_all("th")
cols=[i.get_text(strip=True) for i in all_cols]
first_three=cols[:3]

現在找到表行值並使用,將其拆分為 append 以列出可在 pandas df中使用的列表值列表並提供表結構數據

lst=[]
for i in table.find("tbody").find_all("tr"):
    lst.append(i.get_text(strip=True,separator=",").split(","))
    
filtered=lst[:-1]    
import pandas as pd
df=pd.DataFrame(filtered,columns=first_three)
df

Output:

Residential property prices Sep Qtr 21 to Dec Qtr 21    Dec Qtr 20 to Dec Qtr 21
0   Weighted average of eight capital cities    4.7 23.7
1   Sydney  4.1 26.7
....

方法二

從 URL 獲取表的最簡單方法是使用pandas模塊獲取所有表詳細信息

import pandas
lst=pd.read_html("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
df=lst[0]

Output:

Residential property prices Sep Qtr 21 to Dec Qtr 21    Dec Qtr 20 to Dec Qtr 21
Residential property prices % change                    % change
0   Weighted average of eight capital cities    4.7     23.7
1   Sydney  4.1 26.7
2   Melbourne   3.9 20.0
.....

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM