Beautifulsoup無法通過文字找到標簽

Question

Beautifulsoup突然找不到它的文字標簽。

我有一個html，其中出現此標記：

<span class="date">Telefon: <b>+421 902 808 344</b></span>

BS4找不到此標簽：

telephone = soup.find('span',{'text':re.compile('.*Telefon.*')})
print telephone

>>> None

我嘗試了很多方法

find('span',text='Telefon: ')或find('span', text=re.compile('Telefon: .*')

但沒有任何作用。 我已經嘗試將html.parser更改為lxml 。

可能有什么不對？

Answer 1

BeautifulSoup認為字符串Telefon:作為span標記內的bs4.element.NavigableString 。 所以你可以找到它

import bs4
import re

soup = bs4.BeautifulSoup('<span class="date">Telefon: <b>+421 902 808 344</b></span>')
for span in soup.find_all('span', {'class':"date"}):
    if span.find(text=re.compile('Telefon:')):
        for text in span.stripped_strings:
            print(text)
# Telefon:
# +421 902 808 344

或者，您可以直接使用lxml：

import lxml.html as LH

root = LH.fromstring('<span class="date">Telefon: <b>+421 902 808 344</b></span>')

for span in root.xpath('//span[@class="date" and contains(text(), "Telefon:")]'):
    print(span.text_content())
    # Telefon: +421 902 808 344

Beautifulsoup無法通過文字找到標簽

問題描述

1 個解決方案

解決方案1
4 已采納 2015-05-12 16:46:11

Beautifulsoup無法通過文字找到標簽

問題描述

1 個解決方案

解決方案1 4 已采納 2015-05-12 16:46:11

解決方案1
4 已采納 2015-05-12 16:46:11