I am trying to web scrap a webpage to download the data. The code is as follow:
from bs4 import BeautifulSoup
import requests
link='https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release'
data=[]
url=requests.get(link).text
soup=BeautifulSoup(url,'lxml')
table=soup.find_all('div',class_='abs-content clearfix text-formatted field field--name-field-abs-text-paragraph-content field--type-text-long field--label-hidden')
for data in table:
region=data.find_all('tr')
for header in region:
print(header.text)
And I finally get the below result:
Residential property pricesSep Qtr 21 to Dec Qtr 21Dec Qtr 20 to Dec Qtr 21
% change% change
Weighted average of eight capital cities4.723.7
Sydney4.126.7
Melbourne3.920.0
Brisbane9.627.8
Adelaide6.823.9
Perth2.915.7
Hobart6.529.8
Darwin1.513.0
Canberra6.428.8
Total value of the dwelling stock in AustraliaDec Qtr 21
Value of dwelling stock(a) ($b)9,901.6
Mean price of residential dwellings ($'000)920.1
Number of residential dwellings ('000)10,761.9
RPPI (a)HPIADPI
Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21
% change% change% change
Weighted average of eight capital cities4.75.33.2
Sydney4.14.53.3
Melbourne3.94.23.1
Brisbane9.610.83.9
Adelaide6.87.73.1
Perth2.92.92.3
Hobart6.56.74.9
Darwin1.51.32.1
Canberra6.47.04.3
How can I separate them into different columns and present the following data like this? Thank you.
Method 1
Using bs4
module first find table
import requests
from bs4 import BeautifulSoup
res=requests.get("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
soup=BeautifulSoup(res.text,"html.parser")
table=soup.find("table")
Now find table headers and append to list of values
all_cols=table.find_all("th")
cols=[i.get_text(strip=True) for i in all_cols]
first_three=cols[:3]
Now find table row values and split it using ,
to append to list as list of list values which can be use in pandas df
and gives table structure data
lst=[]
for i in table.find("tbody").find_all("tr"):
lst.append(i.get_text(strip=True,separator=",").split(","))
filtered=lst[:-1]
import pandas as pd
df=pd.DataFrame(filtered,columns=first_three)
df
Output:
Residential property prices Sep Qtr 21 to Dec Qtr 21 Dec Qtr 20 to Dec Qtr 21
0 Weighted average of eight capital cities 4.7 23.7
1 Sydney 4.1 26.7
....
Method 2
Simplest Method to get tables from URL is to use pandas
module to get all table details
import pandas
lst=pd.read_html("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
df=lst[0]
Output:
Residential property prices Sep Qtr 21 to Dec Qtr 21 Dec Qtr 20 to Dec Qtr 21
Residential property prices % change % change
0 Weighted average of eight capital cities 4.7 23.7
1 Sydney 4.1 26.7
2 Melbourne 3.9 20.0
.....
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.