简体   繁体   English

从网站上删除html标签-BeautifulSoup

[英]Remove html tag from Website - BeautifulSoup

I am crawling data from a website. 我正在从网站爬网数据。 This website has code like this: 该网站的代码如下:

<span class="demo-span">
    <b>Tag b:</b> 
    <a href="...">Hello</a> 
     world!
</span>

This is what I tried: 这是我尝试的:

new_data = data.find("span",{"class":"demo-span"})
print(new_data.get_text())

Expected output: 预期产量:

Hello world!

But the actual output is: 但是实际输出是:

Tag b: Hello world!

You can use decompose() to delete a tag. 您可以使用decompose()删除标签。

html = '''
<span class="demo-span">
    <b>Tag b:</b> 
    <a href="...">Hello</a> 
     world!
</span>'''

soup = BeautifulSoup(html, 'html.parser')

new_data = soup.find("span", {"class": "demo-span"})
new_data.b.decompose()
print(new_data.get_text(' ', strip=True))
# Hello world!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM