I'm working with BeautifulSoup. I wish that if I see the tag -a href- the entire line is deleted, but, actually, not.
By example, if I have :
<a href="/psf-landing/">
This is a test message
</a>
Actually, I can have :
<a>
This is a test message
</a>
So, how can I just get :
This is a test message
Here is my code :
soup = BeautifulSoup(content_driver, "html.parser")
for element in soup(text=lambda text: isinstance(text, Comment)):
element.extract()
for titles in soup.findAll('a'):
del titles['href']
tree = soup.prettify()
Try to use .extract()
method. In your case, you're just deleting an attribute
for titles in soup.findAll('a'):
if titles['href'] is not None:
titles.extract()
Here,you can see the detailed examples Dzone NLP examples
what you need is :
text = soup.get_text(strip=True)
This is the sample example:
from bs4 import BeautifulSoup
import urllib.request
response = urllib.request.urlopen('http://php.net/')
html = response.read()
soup = BeautifulSoup(html,"html5lib")
text = soup.get_text(strip=True)
print (text)
You are looking for the unwrap()
method. Have a look at the following snippet:
html = '''
<a href="/psf-landing/">
This is a test message
</a>'''
soup = BeautifulSoup(html, 'html.parser')
for el in soup.find_all('a', href=True):
el.unwrap()
print(soup)
# This is a test message
Using href=True
will match only the tags that have href
as an attribute.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.