簡體   English   中英

從美麗的湯提取物中除去標簽

[英]Remove tags from beautiful soup extract

我是網絡爬蟲的新手,試圖弄清楚如何刪除不需要的標簽。

我想從加拿大銀行網站獲取有關貨幣政策的公告和相應日期。 我的代碼如下:

from bs4 import BeautifulSoup
import urllib
r=urllib.request.urlopen('https://www.bankofcanada.ca/content_type/publications/mpr/?post_type%5B0%5D=post&post_type%5B1%5D=page').read()
soup = BeautifulSoup(r)

soup.prettify()
letters = soup.find_all("div", class_="media-body")
lobbying = {}
for element in letters:
    lobbying[element.a.get_text()] = {}
print(lobbying)

屏幕截圖中附帶了輸出。 在此處輸入圖片說明

預期產量:-

2017年4月12日:預計加拿大經濟今年將增長2.5%,2018年和2019年將略低於2%

2016年4月13日:加拿大的經濟預計將在2016年增長1.7%,並隨着復雜的調整繼續在明年恢復潛力

提前致謝

您需要每個media div內的media-datemedia-excerpt標簽,並去除空白:

from bs4 import BeautifulSoup
import urllib.request

r = urllib.request.urlopen(
    'https://www.bankofcanada.ca/content_type/publications/mpr/?post_type%5B0%5D=post&post_type%5B1%5D=page').read()
soup = BeautifulSoup(r, "lxml")

lobbying = {}

# All media/div elements.
for element in soup.select(".media"):
    # select_one pulls 1 match, pull the text from each tag.
    lobbying[element.select_one(".media-date").text] = element.select_one(".media-excerpt").text.strip()
print(lobbying)

這會給你:

   {
    'April 18, 2018': 'The Bank’s new forecast calls for economic growth of 2.0 percent this year, 2.1 per cent in 2019 and 1.8 per cent in 2020.',
    'January 17, 2018': 'Growth in the Canadian economy is projected to slow from 3 per cent in 2017 to 2.2 per cent this year and 1.6 per cent in 2019.',
    'October 25, 2017': 'Projections for Canadian economic growth have been increased to 3.1 per cent this year and 2.1 per cent in 2018, with growth of 1.5 per cent forecast for 2019.',
    'July 12, 2017': 'Growth in the Canadian economy is projected to reach 2.8 per cent this year before slowing to 2.0 per cent next year and 1.6 per cent in 2019.',
    'April 12, 2017': 'Canada’s economy is expected to grow by 2 1/2 per cent this year and just below 2 per cent in 2018 and 2019.',
    'January 18, 2017': 'The Canadian economy is expected to expand by 2.1 per cent this year and in 2018.',
    'October 19, 2016': 'Growth in the Canadian economy is expected to increase from 1.1 per cent this year to about 2.0 per cent in 2017 and 2018.',
    'July 13, 2016': 'Canadian economic growth is projected to accelerate from 1.3 per cent this year to 2.2 per cent in 2017.',
    'April 13, 2016': 'Canada’s economy is projected to grow by 1.7 per cent in 2016 and return to potential next year as complex adjustments continue.',
    'January 20, 2016': 'Growth in Canada’s economy is expected to reach 1.4 per cent this year and accelerate to 2.4 per cent in 2017.'}

您也可以使用dict理解來創建dict:

lobbying = {el.select_one(".media-date").text: el.select_one(".media-excerpt").text.strip()
            for el in soup.select(".media")}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM