从网站上删除html标签-BeautifulSoup

Question

I am crawling data from a website. 我正在从网站爬网数据。 This website has code like this: 该网站的代码如下：

<span class="demo-span">
    <b>Tag b:</b> 
    <a href="...">Hello</a> 
     world!
</span>

This is what I tried: 这是我尝试的：

new_data = data.find("span",{"class":"demo-span"})
print(new_data.get_text())

Expected output: 预期产量：

Hello world!

But the actual output is: 但是实际输出是：

Tag b: Hello world!

Answer 1

You can use decompose() to delete a tag. 您可以使用decompose()删除标签。

html = '''
<span class="demo-span">
    <b>Tag b:</b> 
    <a href="...">Hello</a> 
     world!
</span>'''

soup = BeautifulSoup(html, 'html.parser')

new_data = soup.find("span", {"class": "demo-span"})
new_data.b.decompose()
print(new_data.get_text(' ', strip=True))
# Hello world!

从网站上删除html标签-BeautifulSoup

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-06-12 08:00:22

从网站上删除html标签-BeautifulSoup

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-06-12 08:00:22

解决方案1
2 已采纳 2018-06-12 08:00:22