如何使用Selenium获取元素的部分文本

Question

I have this HTML: 我有这个HTML：

<div id="msg">

  <b>text1</b>
  <br>
  text2 <b>text3</b> text4

  <ul class="list">
    <li>...</li>
    <li>...</li>
    <li>...</li>
  </ul>

  text5

</div>

I want to extract from div[@id = 'msg'] the text before ul , using xpath. 我想使用xpath从div[@id = 'msg']提取ul之前的文本。

Like driver.findElement(By.xpath("xpath")).getText() -> text1 text2 text3 text4 像driver.findElement(By.xpath("xpath")).getText() -> text1 text2 text3 text4

It is possible or I should user another logic? 有可能还是我应该使用其他逻辑？

Answer 1

As per @kjhughes in this discussion , XPath is for selection, not manipulation. 根据本讨论中的 @kjhughes，XPath是用于选择的，而不是用于操作的。 You can select nodes as they exist in an XML document, but you cannot transform those nodes. 您可以选择XML文档中存在的节点，但是不能转换这些节点。

In your case, if your XML document includes this node: 就您而言，如果您的XML文档包含此节点：

<div id="msg">
  <b>text1</b>
  <br>
  text2 <b>text3</b> text4
  <ul class="list">
    <li>...</li>
    <li>...</li>
    <li>...</li>
  </ul>
  text5
</div>

You can select the <div > node through //div[@id='msg'] , but the selected node will appear as it appears in the source XML, that is, with the child with class as list within the <ul> node. 您可以通过//div[@id='msg']来选择<div >节点，但是所选节点将显示在源XML中，即在<ul>具有class作为list的孩子节点。

If you want to manipulate or transform a node selected via XPath (to exclude its children elements) you'll have to use the hosting language (XSLT, JavaScript, Python, Java, C#, etc) to manipulate the selection. 如果要操纵或转换通过XPath选择的节点（以排除其子元素），则必须使用托管语言（XSLT，JavaScript，Python，Java，C＃等）来操纵选择。

Solution 解

To extract the texts individually you can use the following solution: 要单独提取文本，可以使用以下解决方案：

WebElement myElement = driver.findElement(By.xpath("//div[@id='msg']"));
String text1 = myElement.findElement(By.xpath("./b")).getAttribute("innerHTML");
String text2 = ((JavascriptExecutor)driver).executeScript('return arguments[0].childNodes[3].textContent;', myElement).toString();
String text3 = ((JavascriptExecutor)driver).executeScript('return arguments[0].childNodes[4].textContent;', myElement).toString();
String text4 = ((JavascriptExecutor)driver).executeScript('return arguments[0].childNodes[5].textContent;', myElement).toString();
String text5 = ((JavascriptExecutor)driver).executeScript('return arguments[0].lastChild.textContent;', myElement).toString();

Answer 2

Just want to share another idea. 只想分享另一个想法。

You can get the OuterHTML and then strip it till "ul" tag and then remove the html tags from the output. 您可以获取OuterHTML，然后将其剥离到“ ul”标签，然后从输出中删除html标签。 Now you can change the string as per your need. 现在，您可以根据需要更改字符串。

I am almost able to get the text you are looking for, using javascript. 我几乎可以使用javascript来获取您要查找的文本。 Pasted it below for your reference, you can do the same in Java. 将其粘贴到下面以供参考，您可以在Java中执行相同的操作。

oHTML = document.querySelector("div#msg").outerHTML
oHTML.substring(0,oHTML.search('<ul')).replace(/<.*>/,'').replace(/<\/?[^>]+(>|$)/g, "").replace(/\n/g, " ").trim()

you can run this in the browser console to see the output. 您可以在浏览器控制台中运行此命令以查看输出。 Below is the javascript output. 以下是javascript输出。

text1      text2 text3 text4

如何使用Selenium获取元素的部分文本

问题描述

2 个解决方案

解决方案1
0 已采纳 2019-03-06 12:51:10

Solution 解

解决方案2
0 2019-03-08 06:23:11

如何使用Selenium获取元素的部分文本

问题描述

2 个解决方案

解决方案1 0 已采纳 2019-03-06 12:51:10

Solution 解

解决方案2 0 2019-03-08 06:23:11

解决方案1
0 已采纳 2019-03-06 12:51:10

解决方案2
0 2019-03-08 06:23:11