如何使用BeautifulSoup和Python删除仅包含空格的HTML标签

Question

I have been trying to scrape some HTML and extract certain texts from it. 我一直在尝试抓取一些HTML并从中提取某些文本。

The HTML has tags that are empty or tags that only contain whitespace. HTML具有为空的标记或仅包含空格的标记。

How can I get rid of all those tags from my tree? 如何摆脱树上所有这些标签？ I am using beautiful soup and python. 我正在使用漂亮的汤和蟒蛇。

Answer 1

You can use decompose() function to do this. 您可以使用decompose()函数执行此操作。

markup = '<a href="http://example.com/">I linked to <i>example.com</i></a>'
soup = BeautifulSoup(markup)
a_tag = soup.a

soup.i.decompose()

a_tag
# <a href="http://example.com/">I linked to</a>

You will need to loop over the tags though and find out the tags that have empty content and then use the function above to delete it from your tree. 但是，您将需要遍历标签，找出内容为空的标签，然后使用上面的函数将其从树中删除。

如何使用BeautifulSoup和Python删除仅包含空格的HTML标签

问题描述

1 个解决方案

解决方案1
0 2018-03-02 22:53:32

如何使用BeautifulSoup和Python删除仅包含空格的HTML标签

问题描述

1 个解决方案

解决方案1 0 2018-03-02 22:53:32

解决方案1
0 2018-03-02 22:53:32