Python BeautifulSoup：從 div 標簽中檢索文本

Question

我是網絡抓取的新手。 我正在使用美麗的湯來提取 google play 商店。 但是，我堅持從 div 標簽中檢索文本。 Div 標簽如下所示：

a = <`div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>`

我想檢索從“感謝您的反饋”開始的文本。 我使用以下代碼來檢索文本：

response = a.find('div',{'class':'LVQB0b'}).get_text()

但是，上述命令也會返回不需要的文本，即“Education.com”和日期。 我不確定如何從沒有類名的 div 標簽中檢索文本，如上例所示。 等待您的指導。

Answer 1

使用find(text=True, recursive=False)

前任：

from bs4 import BeautifulSoup

s = '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''    
html = BeautifulSoup(s, 'html.parser')
print(html.find('div',{'class':'LVQB0b'}).find(text=True, recursive=False))

輸出：

Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!

Answer 2

不需要的文本是<div class="LVQB0b">元素的一部分。 您可以找到這些元素並從結果中刪除它們的文本

response = a.find('div',{'class':'LVQB0b'}).get_text()
unwanted = a.select('.LVQB0b span')
for el in unwanted:
    response = response.replace(el.get_text(), '')

Answer 3

作為替代，您可以使用next_sibling或find_next_sibling(text=True)

from bs4 import BeautifulSoup

html= '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('div',class_='QoPmEb').find_next('div').next_sibling)

from bs4 import BeautifulSoup

html= '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('div',class_='QoPmEb').find_next('div').find_next_sibling(text=True))

Python BeautifulSoup：從 div 標簽中檢索文本

問題描述

3 個解決方案

解決方案1
4 已采納 2020-01-07 09:31:46

解決方案2
2 2020-01-07 09:33:08

解決方案3
1 2020-01-07 10:43:46

Python BeautifulSoup：從 div 標簽中檢索文本

問題描述

3 個解決方案

解決方案1 4 已采納 2020-01-07 09:31:46

解決方案2 2 2020-01-07 09:33:08

解決方案3 1 2020-01-07 10:43:46

解決方案1
4 已采納 2020-01-07 09:31:46

解決方案2
2 2020-01-07 09:33:08

解決方案3
1 2020-01-07 10:43:46