BeautifulSoup 字符串搜索

Question

我一直在谷歌搜索並查看其他問題，以在 BeautifulSoup 對象中搜索字符串。

根據我的搜索，以下內容應該檢測到該字符串 - 但它沒有：

strings = soup.find_all(string='Results of Operations and Financial Condition')

但是，以下檢測字符串：

tags = soup.find_all('div',{'class':'info'})

for tag in tags:

    if re.search('Results of Operations and Financial Condition',tag.text):

    ''' Do Something'''

為什么一個有效而另一個無效？

Answer 1

您可能想要使用：

strings = soup.find_all(string=lambda x: 'Results of Operations and Financial Condition' in x)

發生這種情況是因為find_all的實現查找您搜索的字符串以完全匹配。 我想您可能在'Results of Operations and Financial Condition'旁邊還有一些其他文字。

如果您查看此處的文檔，您可以看到您可以為該string參數提供一個函數，似乎以下幾行是等效的：

soup.find_all(string='Results of Operations and Financial Condition')
soup.find_all(string=lambda x: x == 'Results of Operations and Financial Condition')

Answer 2

對於此代碼

page = urllib.request.urlopen('https://en.wikipedia.org/wiki/Alloxylon_pinnatum')
sp = bs4.BeautifulSoup(page)
print(sp.find_all(string=re.compile('The pinkish-red compound flowerheads'))) # You need to use like this to search within text nodes.
print(sp.find_all(string='The pinkish-red compound flowerheads, known as'))
print(sp.find_all(string='The pinkish-red compound flowerheads, known as ')) #notice space at the end of string

結果是——

['The pinkish-red compound flowerheads, known as ']
[]
['The pinkish-red compound flowerheads, known as ']

看起來string參數搜索精確的完整字符串匹配，而不是某個 HTML 文本節點是否包含該字符串，而是 HTML 文本節點的精確值。 但是，您可以使用正則表達式來搜索文本節點是否包含某個字符串，如上面的代碼所示。

BeautifulSoup 字符串搜索

問題描述

2 個解決方案

解決方案1
2 2020-03-15 17:20:28

解決方案2
1 已采納 2020-03-15 17:16:05

BeautifulSoup 字符串搜索

問題描述

2 個解決方案

解決方案1 2 2020-03-15 17:20:28

解決方案2 1 已采納 2020-03-15 17:16:05

解決方案1
2 2020-03-15 17:20:28

解決方案2
1 已采納 2020-03-15 17:16:05