Python/BeautifulSoup - how to remove all tags from an element?

Question

如何简单地从 BeautifulSoup 中找到的元素中删除所有标签？

Answer 1

With BeautifulStoneSoup gone in bs4 , it's even simpler in Python3

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)
text = soup.get_text()
print(text)

Answer 2

why has no answer I've seen mentioned anything about the unwrap method? Or, even easier, the get_text method

http://www.crummy.com/software/BeautifulSoup/bs4/doc/#unwrap http://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text

Answer 3

Use get_text() , it returns all the text in a document or beneath a tag, as a single Unicode string.

For instance, remove all different script tags from the following text:

<td><a href="http://www.irit.fr/SC">Signal et Communication</a>
<br/><a href="http://www.irit.fr/IRT">Ingénierie Réseaux et Télécommunications</a>
</td>

The expected result is:

Signal et Communication
Ingénierie Réseaux et Télécommunications

Here is the source code:

#!/usr/bin/env python3
from bs4 import BeautifulSoup

text = '''
<td><a href="http://www.irit.fr/SC">Signal et Communication</a>
<br/><a href="http://www.irit.fr/IRT">Ingénierie Réseaux et Télécommunications</a>
</td>
'''
soup = BeautifulSoup(text)

print(soup.get_text())

Answer 4

You can use the decompose method in bs4:

soup = bs4.BeautifulSoup('<body><a href="http://example.com/">I linked to <i>example.com</i></a></body>')

for a in soup.find('a').children:
    if isinstance(a,bs4.element.Tag):
        a.decompose()

print soup

Out: <html><body><a href="http://example.com/">I linked to </a></body></html>

Answer 5

Code to simply get the contents as text instead of html:

'html_text' parameter is the string which you will pass in this function to get the text

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_text, 'lxml')
text = soup.get_text()
print(text)

Answer 6

it looks like this is the way to do! as simple as that

with this line you are joining together the all text parts within the current element

''.join(htmlelement.find(text=True))

Answer 7

Here is the source code: you can get the text which is exactly in the URL

URL = ''
page = requests.get(URL)
soup = bs4.BeautifulSoup(page.content,'html.parser').get_text()
print(soup)

Python/BeautifulSoup - how to remove all tags from an element?

Question

7 answers

solution1
120 2015-01-27 02:47:02

solution2
17 2014-04-29 00:40:34

solution3
13 2015-07-20 16:37:08

solution4
8 2013-10-17 22:37:41

solution5
3 2020-05-18 08:53:36

solution6
1 2013-04-25 04:46:12

solution7
0 2020-03-10 15:08:30

Python/BeautifulSoup - how to remove all tags from an element?

Question

7 answers

solution1 120 2015-01-27 02:47:02

solution2 17 2014-04-29 00:40:34

solution3 13 2015-07-20 16:37:08

solution4 8 2013-10-17 22:37:41

solution5 3 2020-05-18 08:53:36

solution6 1 2013-04-25 04:46:12

solution7 0 2020-03-10 15:08:30

solution1
120 2015-01-27 02:47:02

solution2
17 2014-04-29 00:40:34

solution3
13 2015-07-20 16:37:08

solution4
8 2013-10-17 22:37:41

solution5
3 2020-05-18 08:53:36

solution6
1 2013-04-25 04:46:12

solution7
0 2020-03-10 15:08:30