在 BeautifulSoup 中查找所有包含字符串的標簽

Question

在 BeautifulSoup 中，我可以使用find_all(string='example')查找與字符串或正則表達式匹配的所有 NavigableString。

有沒有辦法使用get_text()而不是string來執行此操作，以便搜索匹配一個字符串，即使它跨越多個節點？ 即我想做類似的事情： find_all(get_text()='Python BeautifulSoup') ，這將匹配整個內部字符串內容。

例如，拿這個片段：

<body>
  <div>
    Python
    <br>
    BeautifulSoup
  </div>
</body>

如果我想找到“Python Beautiful Soup”並讓它同時返回body和div標簽，我該怎么做呢？

Answer 1

您可以將css selectors與偽 class 結合使用:-soup-contains-own()

soup.select_one(':-soup-contains-own("BeautifulSoup")')

或者只獲取元素的文本：

soup.select_one(':-soup-contains-own("BeautifulSoup")').get_text(' ', strip=True)

例子

from bs4 import BeautifulSoup

html = '''
<body>
  <div>
    Python
    <br>
    BeautifulSoup
  </div>
</body>
'''
soup = BeautifulSoup(html)

soup.select(':-soup-contains-own("BeautifulSoup")')

Output

[<div>
 Python
 <br/>
 BeautifulSoup
</div>]

Answer 2

您可以在 .find_all 中使用 lambda .find_all ：

from bs4 import BeautifulSoup

html_doc = '''\
<body>
  <div>
    Python
    <br>
    BeautifulSoup
  </div>
</body>'''

soup = BeautifulSoup(html_doc, 'html.parser')

for tag in soup.find_all(lambda tag: 'Python BeautifulSoup' in tag.get_text(strip=True, separator=' ')):
    print(tag.name)

印刷：

body
div

在 BeautifulSoup 中查找所有包含字符串的標簽

問題描述

2 個解決方案

解決方案1
2 2023-01-31 12:50:38

例子

Output

解決方案2
1 已采納 2023-01-31 18:20:47

在 BeautifulSoup 中查找所有包含字符串的標簽

問題描述

2 個解決方案

解決方案1 2 2023-01-31 12:50:38

例子

Output

解決方案2 1 已采納 2023-01-31 18:20:47

解決方案1
2 2023-01-31 12:50:38

解決方案2
1 已采納 2023-01-31 18:20:47