[英]Python - BeautifulSoup - Unable to extract Span Value
我有一個帶有多個 Div 類/跨度類的 XML,我正在努力提取文本值。
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I want</span>
到目前為止,我已經寫了這個:
html = driver.page_source
soup = BeautifulSoup(html, "lxml")
spans = soup.find_all('span', attrs={'class': 'html-tag'})[29]
print(spans.text)
不幸的是,這只會打印出“這是我不想要的標題”值,例如
This is the heading I dont want
代碼中的數字[29]
是 position,我需要的文本將始終出現在該處。
我不確定如何檢索我需要的跨度值。
請你幫忙。 謝謝
您可以按<div class="line">
搜索,然后搜索 select 秒<span>
。
例如:
txt = '''
# line 1
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I dont want</span>
</div>
# line 2
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I dont want</span>
</div>
# line 3
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I want</span> <--- this is I want
</div>'''
soup = BeautifulSoup(txt, 'html.parser')
s = soup.select('div.line')[2].select('span')[1] # select 3rd line 2nd span
print(s.text)
印刷:
This is the text I want
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.