[英]Python - BeautifulSoup - Unable to extract Span Value
I have an XML with mutiple Div Classes/Span Classes and I'm struggling to extract a text value.我有一个带有多个 Div 类/跨度类的 XML,我正在努力提取文本值。
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I want</span>
So far I have written this:到目前为止,我已经写了这个:
html = driver.page_source
soup = BeautifulSoup(html, "lxml")
spans = soup.find_all('span', attrs={'class': 'html-tag'})[29]
print(spans.text)
This unfortunately only prints out the "This is a Heading that I dont want" value eg不幸的是,这只会打印出“这是我不想要的标题”值,例如
This is the heading I dont want
Number [29]
in the code is the position where the text I need will always appear.代码中的数字
[29]
是 position,我需要的文本将始终出现在该处。
I'm unsure how to retrieve the span value I need.我不确定如何检索我需要的跨度值。
Please can you assist.请你帮忙。 Thanks
谢谢
You can search by <div class="line">
and then select second <span>
.您可以按
<div class="line">
搜索,然后搜索 select 秒<span>
。
For example:例如:
txt = '''
# line 1
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I dont want</span>
</div>
# line 2
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I dont want</span>
</div>
# line 3
<div class="line">
<span class="html-tag">
"This is a Heading that I dont want"
</span>
<span>This is the text I want</span> <--- this is I want
</div>'''
soup = BeautifulSoup(txt, 'html.parser')
s = soup.select('div.line')[2].select('span')[1] # select 3rd line 2nd span
print(s.text)
Prints:印刷:
This is the text I want
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.