简体   繁体   English

Selenium - 如何从元素中获取文本但保留子元素源

[英]Selenium - How to get the text from an element but retaining child element source

Using Python 3 and Selenium 4.8.0.使用 Python 3 和 Selenium 4.8.0。

Suppose I have假设我有

<p>
    I love <i>pizza</i>.
</p>

Having done做完了

elem = driver.find_element(By.TAG_NAME, "p")

elem.text will contain "I love pizza." elem.text将包含“我喜欢披萨”。

What I want, however, is to somehow retain the information of what text is italicized such that I can automatically generate a .tex file containing, eg然而,我想要的是以某种方式保留斜体文本的信息,以便我可以自动生成一个.tex文件,其中包含,例如

I love \textit{pizza}.

In simple cases, one option would be to find the child <i> element and use string replace methods, but this leads to obvious problems if the child text is contained elsewhere in elem , eg <p>I love <i>love</i> pizza.</p> .在简单的情况下,一种选择是找到子<i>元素并使用字符串替换方法,但是如果子文本包含在elem的其他地方,这会导致明显的问题,例如<p>I love <i>love</i> pizza.</p> .

How might I get around this?我该如何解决这个问题?

Update : Ultimately the LaTeX (like the one in the question), but all I really need help with is getting to some intermediate step such as ["I love ", "pizza", "."] where I know that it alternates between italicized or not, or even just getting the text back as something like "I love pizza ."更新:最终是 LaTeX(就像问题中的那个),但我真正需要帮助的是进入一些中间步骤,例如 ["I love", "pizza", ".]] 我知道它在两者之间交替斜体与否,甚至只是将文本返回为“我喜欢披萨”之类的东西。 would be great.会很好。

To extract the text I love <i>pizza</i> instead of the text attribute, you need innerHTML as follows:要提取文本I love <i>pizza</i>而不是文本属性,您需要innerHTML如下:

print(driver.find_element(By.TAG_NAME, "p").get_attribute("innerHTML"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM