[英]Python Selenium how to get text from a div after a span
I want to select text within a div after a span. 我想在跨度后在div中选择文本。
Source looks like this: 来源看起来像这样:
<div id="citation">
<cite>Journal</cite>
", "
<span class="year">2014</span>
", "
<span class="volume">100</span>
" (4), pp 100-200"
</div>
I only want the " (4), pp 100-200". 我只想要“(4),第100-200页”。
I know how to get text out of the entire div, or each span, but how do I grab only the last text? 我知道如何从整个div或每个跨度中获取文本,但是如何仅获取最后一个文本? This XPATH will not work.
此XPATH将不起作用。 ISSUE_XPATH = "//*[@id=\\"citation\\"]/text()[3]"
ISSUE_XPATH =“ // * [@ id = \\” citation \\“] / text()[3]”
And shows this error message: 并显示此错误消息:
selenium.common.exceptions.InvalidSelectorException: Message: {"errorMessage":"The result of the xpath expression \\"//*[@id=\\"citation\\"]/text()[3]\\" is: [object Text]. It should be an element." selenium.common.exceptions.InvalidSelectorException:消息:{“ errorMessage”:“ xpath表达式\\” // * [@ id = \\“ citation \\”] / text()[3] \\“的结果是:[对象文字]。应该是一个元素。”
Unfortunately, //*[@id=\\"citation\\"]/text()[3]
is not going to work in selenium - you can only target actual elements, not text nodes. 不幸的是,
//*[@id=\\"citation\\"]/text()[3]
在硒中不起作用-您只能定位实际元素,而不能定位文本节点。
What I would do in this case is to additionally use BeautifulSoup
HTML parser which would help to locate a specific text sibling after the span
element with class="volume"
: 在这种情况下,我要做的是另外使用
BeautifulSoup
HTML解析器,该解析器将帮助在span
元素后使用class="volume"
定位特定的文本同级:
from bs4 import BeautifulSoup
citation = driver.find_element_by_id("citation")
html = citation.get_attribute("outerHTML")
soup = BeautifulSoup(html, "html.parser")
desired_text = soup.find("span", class_="volume").next_sibling
print(desired_text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.