简体   繁体   English

Python BeautifulSoup 美化

[英]Python BeautifulSoup prettify

I am trying to web scrap a webpage to download the data.我正在尝试 web 废弃网页以下载数据。 The code is as follow:代码如下:

from bs4 import BeautifulSoup
import requests

link='https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release'
data=[]

url=requests.get(link).text
soup=BeautifulSoup(url,'lxml')
table=soup.find_all('div',class_='abs-content clearfix text-formatted field field--name-field-abs-text-paragraph-content field--type-text-long field--label-hidden')
for data in table:
    region=data.find_all('tr')
    for header in region:
        print(header.text)

And I finally get the below result:我终于得到以下结果:

Residential property pricesSep Qtr 21 to Dec Qtr 21Dec Qtr 20 to Dec Qtr 21
% change% change
Weighted average of eight capital cities4.723.7
Sydney4.126.7
Melbourne3.920.0
Brisbane9.627.8
Adelaide6.823.9
Perth2.915.7
Hobart6.529.8
Darwin1.513.0
Canberra6.428.8
   
Total value of the dwelling stock in AustraliaDec Qtr 21
Value of dwelling stock(a) ($b)9,901.6
Mean price of residential dwellings ($'000)920.1
Number of residential dwellings ('000)10,761.9
 RPPI (a)HPIADPI
Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21Sep Qtr 21 to Dec Qtr 21
% change% change% change
Weighted average of eight capital cities4.75.33.2
Sydney4.14.53.3
Melbourne3.94.23.1
Brisbane9.610.83.9
Adelaide6.87.73.1
Perth2.92.92.3
Hobart6.56.74.9
Darwin1.51.32.1
Canberra6.47.04.3

How can I separate them into different columns and present the following data like this?如何将它们分成不同的列并像这样呈现以下数据? Thank you.谢谢你。 在此处输入图像描述

Method 1方法一

Using bs4 module first find table使用bs4模块先找table

import requests
from bs4 import BeautifulSoup
res=requests.get("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
soup=BeautifulSoup(res.text,"html.parser")
table=soup.find("table")

Now find table headers and append to list of values现在找到表头和 append 到值列表

all_cols=table.find_all("th")
cols=[i.get_text(strip=True) for i in all_cols]
first_three=cols[:3]

Now find table row values and split it using , to append to list as list of list values which can be use in pandas df and gives table structure data现在找到表行值并使用,将其拆分为 append 以列出可在 pandas df中使用的列表值列表并提供表结构数据

lst=[]
for i in table.find("tbody").find_all("tr"):
    lst.append(i.get_text(strip=True,separator=",").split(","))
    
filtered=lst[:-1]    
import pandas as pd
df=pd.DataFrame(filtered,columns=first_three)
df

Output: Output:

Residential property prices Sep Qtr 21 to Dec Qtr 21    Dec Qtr 20 to Dec Qtr 21
0   Weighted average of eight capital cities    4.7 23.7
1   Sydney  4.1 26.7
....

Method 2方法二

Simplest Method to get tables from URL is to use pandas module to get all table details从 URL 获取表的最简单方法是使用pandas模块获取所有表详细信息

import pandas
lst=pd.read_html("https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release")
df=lst[0]

Output: Output:

Residential property prices Sep Qtr 21 to Dec Qtr 21    Dec Qtr 20 to Dec Qtr 21
Residential property prices % change                    % change
0   Weighted average of eight capital cities    4.7     23.7
1   Sydney  4.1 26.7
2   Melbourne   3.9 20.0
.....

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM