简体   繁体   English

Python Selenium如何在跨度后从div获取文本

[英]Python Selenium how to get text from a div after a span

I want to select text within a div after a span. 我想在跨度后在div中选择文本。

Source looks like this: 来源看起来像这样:

<div id="citation">
    <cite>Journal</cite>
    ", "
    <span class="year">2014</span>
    ", "
    <span class="volume">100</span>
    " (4), pp 100-200"
</div>

I only want the " (4), pp 100-200". 我只想要“(4),第100-200页”。

I know how to get text out of the entire div, or each span, but how do I grab only the last text? 我知道如何从整个div或每个跨度中获取文本,但是如何仅获取最后一个文本? This XPATH will not work. 此XPATH将不起作用。 ISSUE_XPATH = "//*[@id=\\"citation\\"]/text()[3]" ISSUE_XPATH =“ // * [@ id = \\” citation \\“] / text()[3]”

And shows this error message: 并显示此错误消息:

selenium.common.exceptions.InvalidSelectorException: Message: {"errorMessage":"The result of the xpath expression \\"//*[@id=\\"citation\\"]/text()[3]\\" is: [object Text]. It should be an element." selenium.common.exceptions.InvalidSelectorException:消息:{“ errorMessage”:“ xpath表达式\\” // * [@ id = \\“ citation \\”] / text()[3] \\“的结果是:[对象文字]。应该是一个元素。”

Unfortunately, //*[@id=\\"citation\\"]/text()[3] is not going to work in selenium - you can only target actual elements, not text nodes. 不幸的是, //*[@id=\\"citation\\"]/text()[3]在硒中不起作用-您只能定位实际元素,而不能定位文本节点。

What I would do in this case is to additionally use BeautifulSoup HTML parser which would help to locate a specific text sibling after the span element with class="volume" : 在这种情况下,我要做的是另外使用BeautifulSoup HTML解析器,该解析器将帮助在span元素后使用class="volume"定位特定的文本同级:

from bs4 import BeautifulSoup

citation = driver.find_element_by_id("citation")
html = citation.get_attribute("outerHTML")

soup = BeautifulSoup(html, "html.parser")
desired_text = soup.find("span", class_="volume").next_sibling
print(desired_text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM