I have a soup in Python like this:
<p>
<span style="text-decoration: underline; color: #3366ff;">
Title:
</span>
Info
</p>
<p>
<span style="color: #3366ff;">
<span style="text-decoration: underline;">
Title2:
</span>
</span>
Info2
</p>
I'd like to get it to look like this:
<p>
Title:
Info
</p>
<p>
Title2:
Info2
</p>
Is there a way to do this with bs4?
You'll be wanting to use beautifulsoup's unwrap() for this.
import bs4
soup1 = bs4.BeautifulSoup(htm1, 'html.parser')
for match in soup1.findAll('span'):
match.unwrap()
print soup1
You can also use replace_with
to remove span tags:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for span_tag in soup.findAll('span'):
span_tag.replace_with('')
print(soup)
I wrote this function if it can help :
def deleteBalise(string):
for i in range(2):
# identifying <
rankBegin = 0
for carac in string:
if carac == '<':
break
rankBegin += 1
# identifying >
rankEnd = 0
for carac in string:
if carac == '>':
break
rankEnd += 1
stringToReplace = string[rankBegin:rankEnd+1]
string = string.replace(stringToReplace,'')
return string
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.