![](/img/trans.png)
[英]Web-scraping, Python and Beautifulsoup; extracting all <li> tags under a specific h2 tag
[英]Python & Beautifulsoup web scraping - select a paragraph with a specific child tag
我正在嘗試按以下順序從最后一段中隔離“感興趣的文本”文本:
<div class='div_name_class'>
<p>
<span class='class_name_1' title='title1'>val1</span>
<span class='class_name_1' title='title2'>val2</span>
</p>
<p><span class='class_name_2'><em>text of no interest</em></span>text of interest</p>
到目前為止,我嘗試了:
print soup.find('span', attrs={'class': 'class_name_2'}).parent.text
print soup.find('em').parent.parent.text
但兩者都返回:“不感興趣的文本不感興趣的文本”
我知道可以將“感興趣的文本”與上述結果分開,但這看起來並不是一個很好的解決方案。
感謝您的建議。
您可以使用extract
刪除em
標簽,如下所示:
from bs4 import BeautifulSoup
html = """<div class='div_name_class'>
<p>
<span class='class_name_1' title='title1'>val1</span>
<span class='class_name_1' title='title2'>val2</span>
</p>
<p><span class='class_name_2'><em>text of no interest</em></span>text of interest</p>"""
soup = BeautifulSoup(html)
p = soup.find('span', attrs={'class': 'class_name_2'}).parent
p.span.em.extract()
print p.text
這將顯示:
text of interest
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.