[英]BeautifulSoup String Search
我一直在谷歌搜索並查看其他問題,以在 BeautifulSoup 對象中搜索字符串。
根據我的搜索,以下內容應該檢測到該字符串 - 但它沒有:
strings = soup.find_all(string='Results of Operations and Financial Condition')
但是,以下檢測字符串:
tags = soup.find_all('div',{'class':'info'})
for tag in tags:
if re.search('Results of Operations and Financial Condition',tag.text):
''' Do Something'''
為什么一個有效而另一個無效?
您可能想要使用:
strings = soup.find_all(string=lambda x: 'Results of Operations and Financial Condition' in x)
發生這種情況是因為find_all
的實現查找您搜索的字符串以完全匹配。 我想您可能在'Results of Operations and Financial Condition'
旁邊還有一些其他文字。
如果您查看此處的文檔,您可以看到您可以為該string
參數提供一個函數,似乎以下幾行是等效的:
soup.find_all(string='Results of Operations and Financial Condition')
soup.find_all(string=lambda x: x == 'Results of Operations and Financial Condition')
對於此代碼
page = urllib.request.urlopen('https://en.wikipedia.org/wiki/Alloxylon_pinnatum')
sp = bs4.BeautifulSoup(page)
print(sp.find_all(string=re.compile('The pinkish-red compound flowerheads'))) # You need to use like this to search within text nodes.
print(sp.find_all(string='The pinkish-red compound flowerheads, known as'))
print(sp.find_all(string='The pinkish-red compound flowerheads, known as ')) #notice space at the end of string
結果是——
['The pinkish-red compound flowerheads, known as ']
[]
['The pinkish-red compound flowerheads, known as ']
看起來string
參數搜索精確的完整字符串匹配,而不是某個 HTML 文本節點是否包含該字符串,而是 HTML 文本節點的精確值。 但是,您可以使用正則表達式來搜索文本節點是否包含某個字符串,如上面的代碼所示。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.