Beautifulsoup删除标签

Question

I'm running this code in order to scrape zip codes from a website using BS4. 我正在运行此代码，以便使用BS4从网站上检索邮政编码。

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://example.com"

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")

# grabs each zip code
zip_code = page_soup.findAll("span",{"itemprop":"postalCode"})

print(zip_code)

I end up with this code, which is just a list of spans. 我最终得到了这段代码，这只是一个范围列表。 Each span contains the zip code that I need. 每个范围都包含我需要的邮政编码。

[<span itemprop="postalCode">03257</span>, <span 
itemprop="postalCode">34240</span>, <span 
itemprop="postalCode">84660</span>, <span 
itemprop="postalCode">07717</span>]

However, I cannot figure out how to remove everything BUT the zip code that is in between the span tags. 但是，我无法弄清楚如何删除所有内容，但要删除span标签之间的邮政编码。 The goal is to end up with a list of zip codes only. 目标是仅获得邮政编码列表。

Thank you for the help. 感谢您的帮助。

Answer 1

To get only the text from a tag, use tag.text : 要仅从标签获取文本，请使用tag.text ：

zip_codes = page_soup.find_all("span", {"itemprop": "postalCode"})
zip_codes = [tag.text for tag in zip_codes]

print(zip_codes) # ['03257', '34240', '84660', '07717']

Beautifulsoup删除标签

问题描述

1 个解决方案

解决方案1
2 2018-02-16 20:06:52

Beautifulsoup删除标签

问题描述

1 个解决方案

解决方案1 2 2018-02-16 20:06:52

解决方案1
2 2018-02-16 20:06:52