Beautifulsoup can't find tag by text

Question

Beautifulsoup suddenly can't find a tag by its text.

I have a html in which this tag appears:

<span class="date">Telefon: <b>+421 902 808 344</b></span>

BS4 can't find this tag:

telephone = soup.find('span',{'text':re.compile('.*Telefon.*')})
print telephone

>>> None

I've tried many ways like

find('span',text='Telefon: ') or find('span', text=re.compile('Telefon: .*')

But nothing works. I've tried already change html.parser to lxml .

What may be wrong?

Answer 1

BeautifulSoup regards the string Telefon: as being a bs4.element.NavigableString inside the span tag. So you could find it with

import bs4
import re

soup = bs4.BeautifulSoup('<span class="date">Telefon: <b>+421 902 808 344</b></span>')
for span in soup.find_all('span', {'class':"date"}):
    if span.find(text=re.compile('Telefon:')):
        for text in span.stripped_strings:
            print(text)
# Telefon:
# +421 902 808 344

Or, you could use lxml directly:

import lxml.html as LH

root = LH.fromstring('<span class="date">Telefon: <b>+421 902 808 344</b></span>')

for span in root.xpath('//span[@class="date" and contains(text(), "Telefon:")]'):
    print(span.text_content())
    # Telefon: +421 902 808 344

Beautifulsoup can't find tag by text

Question

1 answers

solution1
4 ACCPTED 2015-05-12 16:46:11

Beautifulsoup can't find tag by text

Question

1 answers

solution1 4 ACCPTED 2015-05-12 16:46:11

solution1
4 ACCPTED 2015-05-12 16:46:11