Python/BeautifulSoup - 如何從元素中刪除所有標簽？

Question

如何簡單地從 BeautifulSoup 中找到的元素中刪除所有標簽？

Answer 1

隨着BeautifulStoneSoup在bs4中bs4 ，它在 Python3 中更簡單

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)
text = soup.get_text()
print(text)

Answer 2

為什么我看到沒有任何答案提到unwrap方法？ 或者，更簡單的是get_text方法

http://www.crummy.com/software/BeautifulSoup/bs4/doc/#unwrap http://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text

Answer 3

使用get_text() ，它將文檔中或標簽下的所有文本作為單個 Unicode 字符串返回。

例如，從以下文本中刪除所有不同的腳本標簽：

<td><a href="http://www.irit.fr/SC">Signal et Communication</a>
<br/><a href="http://www.irit.fr/IRT">Ingénierie Réseaux et Télécommunications</a>
</td>

預期的結果是：

Signal et Communication
Ingénierie Réseaux et Télécommunications

這是源代碼：

#!/usr/bin/env python3
from bs4 import BeautifulSoup

text = '''
<td><a href="http://www.irit.fr/SC">Signal et Communication</a>
<br/><a href="http://www.irit.fr/IRT">Ingénierie Réseaux et Télécommunications</a>
</td>
'''
soup = BeautifulSoup(text)

print(soup.get_text())

Answer 4

你可以在 bs4 中使用分解方法：

soup = bs4.BeautifulSoup('<body><a href="http://example.com/">I linked to <i>example.com</i></a></body>')

for a in soup.find('a').children:
    if isinstance(a,bs4.element.Tag):
        a.decompose()

print soup

Out: <html><body><a href="http://example.com/">I linked to </a></body></html>

Answer 5

簡單地將內容作為文本而不是 html 獲取的代碼：

'html_text'參數是您將在此函數中傳遞以獲取文本的字符串

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_text, 'lxml')
text = soup.get_text()
print(text)

Answer 6

看起來這就是方法！ 就如此容易

使用這一行，您將當前元素中的所有文本部分連接在一起

''.join(htmlelement.find(text=True))

Answer 7

這是源代碼：您可以獲取正好在 URL 中的文本

URL = ''
page = requests.get(URL)
soup = bs4.BeautifulSoup(page.content,'html.parser').get_text()
print(soup)

Python/BeautifulSoup - 如何從元素中刪除所有標簽？

問題描述

7 個解決方案

解決方案1
120 2015-01-27 02:47:02

解決方案2
17 2014-04-29 00:40:34

解決方案3
13 2015-07-20 16:37:08

解決方案4
8 2013-10-17 22:37:41

解決方案5
3 2020-05-18 08:53:36

解決方案6
1 2013-04-25 04:46:12

解決方案7
0 2020-03-10 15:08:30

Python/BeautifulSoup - 如何從元素中刪除所有標簽？

問題描述

7 個解決方案

解決方案1 120 2015-01-27 02:47:02

解決方案2 17 2014-04-29 00:40:34

解決方案3 13 2015-07-20 16:37:08

解決方案4 8 2013-10-17 22:37:41

解決方案5 3 2020-05-18 08:53:36

解決方案6 1 2013-04-25 04:46:12

解決方案7 0 2020-03-10 15:08:30

解決方案1
120 2015-01-27 02:47:02

解決方案2
17 2014-04-29 00:40:34

解決方案3
13 2015-07-20 16:37:08

解決方案4
8 2013-10-17 22:37:41

解決方案5
3 2020-05-18 08:53:36

解決方案6
1 2013-04-25 04:46:12

解決方案7
0 2020-03-10 15:08:30