BeautifulSoup and remove entire tag

Question

I'm working with BeautifulSoup. I wish that if I see the tag -a href- the entire line is deleted, but, actually, not.

By example, if I have :

<a href="/psf-landing/">
This is a test message
</a>

Actually, I can have :

<a>
This is a test message
</a>

So, how can I just get :

This is a test message

Here is my code :

soup = BeautifulSoup(content_driver, "html.parser")
for element in soup(text=lambda text: isinstance(text, Comment)):
    element.extract()
for titles in soup.findAll('a'):
    del titles['href']
tree = soup.prettify()

Answer 1

Try to use .extract() method. In your case, you're just deleting an attribute

for titles in soup.findAll('a'):
    if  titles['href'] is not None:
        titles.extract()

Answer 2

Here,you can see the detailed examples Dzone NLP examples

what you need is :

text = soup.get_text(strip=True)

This is the sample example:

from bs4 import BeautifulSoup
import urllib.request 
response = urllib.request.urlopen('http://php.net/') 
html = response.read()
soup = BeautifulSoup(html,"html5lib")
text = soup.get_text(strip=True)
print (text)

Answer 3

You are looking for the unwrap() method. Have a look at the following snippet:

html = '''
<a href="/psf-landing/">
This is a test message
</a>'''

soup = BeautifulSoup(html, 'html.parser')
for el in soup.find_all('a', href=True):
    el.unwrap()

print(soup)
# This is a test message

Using href=True will match only the tags that have href as an attribute.

BeautifulSoup and remove entire tag

Question

3 answers

solution1
0 2018-04-04 11:00:05

solution2
0 2018-04-04 11:02:18

solution3
0 2018-04-04 11:03:20

BeautifulSoup and remove entire tag

Question

3 answers

solution1 0 2018-04-04 11:00:05

solution2 0 2018-04-04 11:02:18

solution3 0 2018-04-04 11:03:20

solution1
0 2018-04-04 11:00:05

solution2
0 2018-04-04 11:02:18

solution3
0 2018-04-04 11:03:20