![](/img/trans.png)
[英]Extract text from within div tag using BeautifulSoup 4 in Python
[英]Python BeautifulSoup: Retrieving text from div tag
我是網絡抓取的新手。 我正在使用美麗的湯來提取 google play 商店。 但是,我堅持從 div 標簽中檢索文本。 Div 標簽如下所示:
a = <`div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>`
我想檢索從“感謝您的反饋”開始的文本。 我使用以下代碼來檢索文本:
response = a.find('div',{'class':'LVQB0b'}).get_text()
但是,上述命令也會返回不需要的文本,即“Education.com”和日期。 我不確定如何從沒有類名的 div 標簽中檢索文本,如上例所示。 等待您的指導。
使用find(text=True, recursive=False)
前任:
from bs4 import BeautifulSoup
s = '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''
html = BeautifulSoup(s, 'html.parser')
print(html.find('div',{'class':'LVQB0b'}).find(text=True, recursive=False))
輸出:
Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!
不需要的文本是<div class="LVQB0b">
元素的一部分。 您可以找到這些元素並從結果中刪除它們的文本
response = a.find('div',{'class':'LVQB0b'}).get_text()
unwanted = a.select('.LVQB0b span')
for el in unwanted:
response = response.replace(el.get_text(), '')
作為替代,您可以使用next_sibling
或find_next_sibling(text=True)
from bs4 import BeautifulSoup
html= '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('div',class_='QoPmEb').find_next('div').next_sibling)
from bs4 import BeautifulSoup
html= '''<div class="LVQB0b"><div class="QoPmEb"></div><div><span class="X43Kjb">Education.com</span><span class="p2TkOb">August 15, 2019</span></div>Thanks for your feedback. We are sorry to hear you're having trouble with the app. This is a known issue and our team has fixed it. Please restart the app and let us know at support@education.com if you have any further trouble. Thanks!</div>'''
soup = BeautifulSoup(html, 'html.parser')
print(soup.find('div',class_='QoPmEb').find_next('div').find_next_sibling(text=True))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.