Selenium - 如何从元素中获取文本但保留子元素源

Question

Using Python 3 and Selenium 4.8.0.使用 Python 3 和 Selenium 4.8.0。

Suppose I have假设我有

<p>
    I love <i>pizza</i>.
</p>

Having done做完了

elem = driver.find_element(By.TAG_NAME, "p")

elem.text will contain "I love pizza." elem.text将包含“我喜欢披萨”。

What I want, however, is to somehow retain the information of what text is italicized such that I can automatically generate a .tex file containing, eg然而，我想要的是以某种方式保留斜体文本的信息，以便我可以自动生成一个.tex文件，其中包含，例如

I love \textit{pizza}.

In simple cases, one option would be to find the child  element and use string replace methods, but this leads to obvious problems if the child text is contained elsewhere in elem , eg I love love pizza. .在简单的情况下，一种选择是找到子元素并使用字符串替换方法，但是如果子文本包含在elem的其他地方，这会导致明显的问题，例如I love love pizza. .

How might I get around this?我该如何解决这个问题？

Update : Ultimately the LaTeX (like the one in the question), but all I really need help with is getting to some intermediate step such as ["I love ", "pizza", "."] where I know that it alternates between italicized or not, or even just getting the text back as something like "I love pizza ."更新：最终是 LaTeX（就像问题中的那个），但我真正需要帮助的是进入一些中间步骤，例如 ["I love", "pizza", ".]] 我知道它在两者之间交替斜体与否，甚至只是将文本返回为“我喜欢披萨”之类的东西。 would be great.会很好。

Answer 1

To extract the text I love pizza instead of the text attribute, you need innerHTML as follows:要提取文本I love pizza而不是文本属性，您需要innerHTML如下：

print(driver.find_element(By.TAG_NAME, "p").get_attribute("innerHTML"))

Selenium - 如何从元素中获取文本但保留子元素源

问题描述

1 个解决方案

解决方案1
0 已采纳 2023-01-24 07:28:55

Selenium - 如何从元素中获取文本但保留子元素源

问题描述

1 个解决方案

解决方案1 0 已采纳 2023-01-24 07:28:55

解决方案1
0 已采纳 2023-01-24 07:28:55