简体   繁体   中英

Python BeautifulSoup prettify

I am trying to web scrap a webpage to download the data. The code is as follow:

from bs4 import BeautifulSoup
import requests

link='https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release'
data=[]

url=requests.get(link).text
soup=BeautifulSoup(url,'lxml')
table=soup.find_all('div',class_='abs-content clearfix text-formatted field field--name-field-abs-text-paragraph-content field--type-text-long field--label-hidden')
for data in table:
    region=data.find_all('tr')
    for header in region:
        print(header.text)

And I finally get the below result:

Residential property pricesSep Qtr 21 to Dec Qtr 21Dec Qtr 20 to Dec Qtr 21
% change% change
Weighted average of eight capital cities4.723.7
Sydney4.126.7
Melbourne3.920.0
Brisbane9.627.8
Adelaide6.823.9
Perth2.915.7
Hobart6.529.8
Darwin1.513.0
Canberra6.428.8
   
Total value of the dwelling stock in AustraliaDec Qtr 21
Value of dwelling stock(a) ($b)9,901.6
Mean price of residential dwellings ($'000)920.1
Number of residential dwellings ('000)10,761.9
 RPPI (a)HPIADPI
Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21
% change% change% change
Weighted average of eight capital cities4.75.33.2
Sydney4.14.53.3
Melbourne3.94.23.1
Brisbane9.610.83.9
Adelaide6.87.73.1
Perth2.92.92.3
Hobart6.56.74.9
Darwin1.51.32.1
Canberra6.47.04.3

How can I separate them into different columns and present the following data like this? Thank you. 在此处输入图像描述

Method 1

Using bs4 module first find table

import requests
from bs4 import BeautifulSoup
res=requests.get("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
soup=BeautifulSoup(res.text,"html.parser")
table=soup.find("table")

Now find table headers and append to list of values

all_cols=table.find_all("th")
cols=[i.get_text(strip=True) for i in all_cols]
first_three=cols[:3]

Now find table row values and split it using , to append to list as list of list values which can be use in pandas df and gives table structure data

lst=[]
for i in table.find("tbody").find_all("tr"):
    lst.append(i.get_text(strip=True,separator=",").split(","))
    
filtered=lst[:-1]    
import pandas as pd
df=pd.DataFrame(filtered,columns=first_three)
df

Output:

Residential property prices Sep Qtr 21 to Dec Qtr 21    Dec Qtr 20 to Dec Qtr 21
0   Weighted average of eight capital cities    4.7 23.7
1   Sydney  4.1 26.7
....

Method 2

Simplest Method to get tables from URL is to use pandas module to get all table details

import pandas
lst=pd.read_html("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
df=lst[0]

Output:

Residential property prices Sep Qtr 21 to Dec Qtr 21    Dec Qtr 20 to Dec Qtr 21
Residential property prices % change                    % change
0   Weighted average of eight capital cities    4.7     23.7
1   Sydney  4.1 26.7
2   Melbourne   3.9 20.0
.....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM