[英]How do I delete an HTML tag that contains whitespace only using BeautifulSoup and Python
I have been trying to scrape some HTML and extract certain texts from it. 我一直在尝试抓取一些HTML并从中提取某些文本。
The HTML has tags that are empty or tags that only contain whitespace. HTML具有为空的标记或仅包含空格的标记。
How can I get rid of all those tags from my tree? 如何摆脱树上所有这些标签? I am using beautiful soup and python.
我正在使用漂亮的汤和蟒蛇。
You can use decompose()
function to do this. 您可以使用
decompose()
函数执行此操作。
markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a
soup.i.decompose()
a_tag
# <a href="http://example.com/">I linked to</a>
You will need to loop over the tags though and find out the tags that have empty content and then use the function above to delete it from your tree. 但是,您将需要遍历标签,找出内容为空的标签,然后使用上面的函数将其从树中删除。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.