How do I extract
I love Python
from given HTML
I <img src="image.png" alt="love"> Python
Getting string and splitting it won't work, text is controlled by user and might contain <>
There are a few different ways to achieve that. One way to do that would be to find all img
elements and replace them with a text node containing the alt
value of the img
element:
In [1]: from bs4 import BeautifulSoup
In [2]: data = """<div class="commentthread_comment_text">I <img src="image.png" alt="love"> Python</div>"""
In [3]: soup = BeautifulSoup(data, "html.parser")
In [4]: div = soup.find('div', {'class': 'commentthread_comment_text'})
In [5]: for img in div('img'):
...: img.replace_with(img['alt'])
...:
In [6]: div.get_text()
Out[6]: 'I love Python'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.