I need to find the text inside the element ignoring the children text. So, I have used the following code:
text = """<a aria-expanded="false" aria-owns="faqGen5" href="#">aaa <span class="nobreak">bbb</span> ccc?</a>"""
obj = BeautifulSoup(text)
obj.find(text=True)
Expected output
aaa ccc?
Current output
aaa
If you have a look at the .contents
of a tag, you'll see that the text you want belongs to a class called NavigableString
.
from bs4 import BeautifulSoup, NavigableString
html = """<a aria-expanded="false" aria-owns="faqGen5" href="#">aaa <span class="nobreak">bbb</span> ccc?</a>"""
soup = BeautifulSoup(html, 'lxml')
for content in soup.find('a').contents:
print(content, type(content))
# aaa <class 'bs4.element.NavigableString'>
# <span class="nobreak">bbb</span> <class 'bs4.element.Tag'>
# ccc? <class 'bs4.element.NavigableString'>
Now, you simply need to get the elements belonging to the NavigableString
class and join them together.
text = ''.join([x for x in soup.find('a').contents if isinstance(x, NavigableString)])
print(text)
# aaa ccc?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.