[英]Remove html tag from Website - BeautifulSoup
I am crawling data from a website. 我正在从网站爬网数据。 This website has code like this:
该网站的代码如下:
<span class="demo-span">
<b>Tag b:</b>
<a href="...">Hello</a>
world!
</span>
This is what I tried: 这是我尝试的:
new_data = data.find("span",{"class":"demo-span"})
print(new_data.get_text())
Expected output: 预期产量:
Hello world!
But the actual output is: 但是实际输出是:
Tag b: Hello world!
You can use decompose()
to delete a tag. 您可以使用
decompose()
删除标签。
html = '''
<span class="demo-span">
<b>Tag b:</b>
<a href="...">Hello</a>
world!
</span>'''
soup = BeautifulSoup(html, 'html.parser')
new_data = soup.find("span", {"class": "demo-span"})
new_data.b.decompose()
print(new_data.get_text(' ', strip=True))
# Hello world!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.