如何在BS4中搜索包含給定字符串的標簽？

Question

在BeautifulSoup4中，如何搜索帶有包含特定字符串的文本的標簽？ 例如，當搜索“天際”時，我要打印包含字符串“天際”的每個標簽的內容（例如游戲名稱）。

我試過使用

    if 'skyrim' in tag.string:

但是它從不打印任何內容。

完整定義：

def search(self):
    steam_results = self.soup.find_all('span', class_='title')

    itr = 1
    for tag in steam_results:
        if self.title in tag.string:  # <--- Not working
            print(str(itr) + ': ' + tag.string + '\n')
            itr = itr + 1

steam_results樣本：

>>> steam_results
[<span class="title">The Elder Scrolls V: Skyrim Special Edition</span>,
 <span class="title">Skyrim Script Extender (SKSE)</span>, 
 <span class="title">Enderal</span>, ...]

預期結果：

上古卷軸V：天際特別版
Skyrim腳本擴展器（SKSE）

實際結果：不打印任何內容

Answer 1

問題是子字符串檢查，因為它case-sensitive 。 如果您使用skyrim檢查，則結果將為空，因為沒有title包含skyrim而是title包含Skyrim 。 因此，將其與這樣的小寫字母進行比較，

steam_results = soup.find_all('span', class_='title')
for steam in steam_results:
    if 'skyrim' in steam.getText().lower():
        print(steam.getText())

輸出：

The Elder Scrolls V: Skyrim Special Edition
The Elder Scrolls V: Skyrim VR
Skyrim Script Extender (SKSE)
The Elder Scrolls V: Skyrim Special Edition - Creation Club

Answer 2

您可以使用soup.find_all(string=re.compile("your_string_here")來獲取文本，然后使用.parent來獲取標簽。

from bs4 import BeautifulSoup
import re
html="""
<p id="1">Hi there</p>
<p id="2">hello<p>
<p id="2">hello there<p>
"""
soup=BeautifulSoup(html,'html.parser')
print([tag.parent for tag in soup.find_all(string=re.compile("there"))])

產量

[<p id="1">Hi there</p>, <p id="2">hello there<p>\n</p></p>]

如何在BS4中搜索包含給定字符串的標簽？

問題描述

2 個解決方案

解決方案1
2 已采納 2019-01-09 18:40:16

解決方案2
0 2019-01-09 18:05:39

如何在BS4中搜索包含給定字符串的標簽？

問題描述

2 個解決方案

解決方案1 2 已采納 2019-01-09 18:40:16

解決方案2 0 2019-01-09 18:05:39

解決方案1
2 已采納 2019-01-09 18:40:16

解決方案2
0 2019-01-09 18:05:39