Beautifulsoup suddenly can't find a tag by its text.
I have a html in which this tag appears:
<span class="date">Telefon: <b>+421 902 808 344</b></span>
BS4 can't find this tag:
telephone = soup.find('span',{'text':re.compile('.*Telefon.*')})
print telephone
>>> None
I've tried many ways like
find('span',text='Telefon: ')
or find('span', text=re.compile('Telefon: .*')
But nothing works. I've tried already change html.parser
to lxml
.
What may be wrong?
BeautifulSoup regards the string Telefon:
as being a bs4.element.NavigableString
inside the span
tag. So you could find it with
import bs4
import re
soup = bs4.BeautifulSoup('<span class="date">Telefon: <b>+421 902 808 344</b></span>')
for span in soup.find_all('span', {'class':"date"}):
if span.find(text=re.compile('Telefon:')):
for text in span.stripped_strings:
print(text)
# Telefon:
# +421 902 808 344
Or, you could use lxml directly:
import lxml.html as LH
root = LH.fromstring('<span class="date">Telefon: <b>+421 902 808 344</b></span>')
for span in root.xpath('//span[@class="date" and contains(text(), "Telefon:")]'):
print(span.text_content())
# Telefon: +421 902 808 344
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.