使用BeautifulSoup的findAll搜索html元素的innerText以獲得與搜索屬性相同的結果？

Question

例如，如果我按元素的屬性（例如id）進行搜索：

soup.findAll('span',{'id':re.compile("^score_")})

我返回匹配的整個span元素的列表（我喜歡）。

但是，如果我嘗試通過html元素的innerText搜索，如下所示：

soup.findAll('a',text = re.compile("discuss|comment"))

我只獲得匹配的元素back的innerText部分，而不是像上面那樣帶有標簽和屬性的整個元素。

這可能與找不到匹配項然后讓它成為父項有關嗎？

謝謝。

Answer 1

您不會取回文字。 您將獲得帶有文本的NavigableString 。 該對象具有轉到父對象的方法，等等。

from BeautifulSoup import BeautifulSoup
import re

soup = BeautifulSoup('<html><p>foo</p></html>')

r = soup.findAll('p', text=re.compile('foo'))

print r[0].parent

版畫

<p>foo</p>

使用BeautifulSoup的findAll搜索html元素的innerText以獲得與搜索屬性相同的結果？

問題描述

1 個解決方案

解決方案1
6 已采納 2010-04-05 19:14:33

使用BeautifulSoup的findAll搜索html元素的innerText以獲得與搜索屬性相同的結果？

問題描述

1 個解決方案

解決方案1 6 已采納 2010-04-05 19:14:33

解決方案1
6 已采納 2010-04-05 19:14:33