[英]Python BeautifulSoup prettify
I am trying to web scrap a webpage to download the data.我正在尝试 web 废弃网页以下载数据。 The code is as follow:代码如下:
from bs4 import BeautifulSoup
import requests
link='https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release'
data=[]
url=requests.get(link).text
soup=BeautifulSoup(url,'lxml')
table=soup.find_all('div',class_='abs-content clearfix text-formatted field field--name-field-abs-text-paragraph-content field--type-text-long field--label-hidden')
for data in table:
region=data.find_all('tr')
for header in region:
print(header.text)
And I finally get the below result:我终于得到以下结果:
Residential property pricesSep Qtr 21 to Dec Qtr 21Dec Qtr 20 to Dec Qtr 21
% change% change
Weighted average of eight capital cities4.723.7
Sydney4.126.7
Melbourne3.920.0
Brisbane9.627.8
Adelaide6.823.9
Perth2.915.7
Hobart6.529.8
Darwin1.513.0
Canberra6.428.8
Total value of the dwelling stock in AustraliaDec Qtr 21
Value of dwelling stock(a) ($b)9,901.6
Mean price of residential dwellings ($'000)920.1
Number of residential dwellings ('000)10,761.9
RPPI (a)HPIADPI
Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21
% change% change% change
Weighted average of eight capital cities4.75.33.2
Sydney4.14.53.3
Melbourne3.94.23.1
Brisbane9.610.83.9
Adelaide6.87.73.1
Perth2.92.92.3
Hobart6.56.74.9
Darwin1.51.32.1
Canberra6.47.04.3
How can I separate them into different columns and present the following data like this?如何将它们分成不同的列并像这样呈现以下数据? Thank you.谢谢你。
Method 1方法一
Using bs4
module first find table
使用bs4
模块先找table
import requests
from bs4 import BeautifulSoup
res=requests.get("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
soup=BeautifulSoup(res.text,"html.parser")
table=soup.find("table")
Now find table headers and append to list of values现在找到表头和 append 到值列表
all_cols=table.find_all("th")
cols=[i.get_text(strip=True) for i in all_cols]
first_three=cols[:3]
Now find table row values and split it using ,
to append to list as list of list values which can be use in pandas df
and gives table structure data现在找到表行值并使用,
将其拆分为 append 以列出可在 pandas df
中使用的列表值列表并提供表结构数据
lst=[]
for i in table.find("tbody").find_all("tr"):
lst.append(i.get_text(strip=True,separator=",").split(","))
filtered=lst[:-1]
import pandas as pd
df=pd.DataFrame(filtered,columns=first_three)
df
Output: Output:
Residential property prices Sep Qtr 21 to Dec Qtr 21 Dec Qtr 20 to Dec Qtr 21
0 Weighted average of eight capital cities 4.7 23.7
1 Sydney 4.1 26.7
....
Method 2方法二
Simplest Method to get tables from URL is to use pandas
module to get all table details从 URL 获取表的最简单方法是使用pandas
模块获取所有表详细信息
import pandas
lst=pd.read_html("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
df=lst[0]
Output: Output:
Residential property prices Sep Qtr 21 to Dec Qtr 21 Dec Qtr 20 to Dec Qtr 21
Residential property prices % change % change
0 Weighted average of eight capital cities 4.7 23.7
1 Sydney 4.1 26.7
2 Melbourne 3.9 20.0
.....
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.