简体   繁体   中英

Python Beautiful Soup Prettify

I am trying to web scrap a webpage to download the data. The code is as follow:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url='https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release'
res=requests.get(url)
soup=BeautifulSoup(res.text,'html.parser')
table=soup.select_one('table:has(caption:-soup-contains("Residential property price indexes"))')

df=pd.read_html(str(table))[0]
header=[th.text for th in table.thead.select("th")]
print(*header,sep='\t')
for row in table.tbody.select("tr"):
    tds=[td.text for td in row.select("td")]
    print(*tds,'\t')

And I finally get the below result:

RPPI (a)    HPI ADPI    Sep Qtr 21 to Dec Qtr 21    Sep Qtr 21 to Dec Qtr 21    Sep Qtr 21 to Dec Qtr 21    % change    % change    % change
4.7 5.3 3.2     
Sydney 4.1 4.5 3.3  
Melbourne 3.9 4.2 3.1   
Brisbane 9.6 10.8 3.9   
Adelaide 6.8 7.7 3.1    
Perth 2.9 2.9 2.3   
Hobart 6.5 6.7 4.9  
Darwin 1.5 1.3 2.1  
Canberra 6.4 7.0 4.3 

How can I edit the header into three rows and prettify the data like this: 在此处输入图像描述

I would like to correct the header but I have no idea how to correct it as they have the same tag in the website. Thank you.

Try:

import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release"
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
table = soup.select_one(
    'table:has(caption:-soup-contains("Residential property price indexes"))'
)

df = pd.read_html(str(table))[0]
df = df.rename(
    columns={
        "Unnamed: 0_level_0": "",
        "Unnamed: 0_level_1": "",
        "Unnamed: 0_level_2": "",
    }
)
print(df)

Prints:

                                                            RPPI (a)                      HPI                     ADPI
                                            Sep Qtr 21 to Dec Qtr 21 Sep Qtr 21 to Dec Qtr 21 Sep Qtr 21 to Dec Qtr 21
                                                            % change                 % change                 % change
0  Weighted average of eight capital cities                      4.7                      5.3                      3.2
1                                    Sydney                      4.1                      4.5                      3.3
2                                 Melbourne                      3.9                      4.2                      3.1
3                                  Brisbane                      9.6                     10.8                      3.9
4                                  Adelaide                      6.8                      7.7                      3.1
5                                     Perth                      2.9                      2.9                      2.3
6                                    Hobart                      6.5                      6.7                      4.9
7                                    Darwin                      1.5                      1.3                      2.1
8                                  Canberra                      6.4                      7.0                      4.3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM