简体   繁体   中英

How to use Selenium get text from an element not including its sub-elements

HTML

<div id='one'>
    <button id='two'>I am a button</button>
    <button id='three'>I am a button</button>
    I am a div
</div>

Code

driver.findElement(By.id('one')).getText();

I've seen this question pop up a few times in the last maybe year or so and I've wanted to try writing this function... so here you go. It takes the parent element and removes each child's textContent until what remains is the textNode. I've tested this on your HTML and it works.

/**
 * Takes a parent element and strips out the textContent of all child elements and returns textNode content only
 * 
 * @param e
 *            the parent element
 * @return the text from the child textNodes
 */
public static String getTextNode(WebElement e)
{
    String text = e.getText().trim();
    List<WebElement> children = e.findElements(By.xpath("./*"));
    for (WebElement child : children)
    {
        text = text.replaceFirst(child.getText(), "").trim();
    }
    return text;
}

and you call it

System.out.println(getTextNode(driver.findElement(By.id("one"))));

Warning: the initial solution (deep below) won't work
I opened an enhancement request: 2840 against the Selenium WebDrive and another one against the W3C WebDrive specification - the more votes, the sooner they'll get enough attention (one can hope). Until then, the solution suggested by @shivansh in the other answer (execution of a JavaScript via Selenium) remains the only alternative. Here's the Java adaptation of that solution (collects all text nodes, discards all that are whitespace only, separates the remaining by \\t):

WebElement e=driver.findElement(By.xpath("//*[@id='one']"));
if(driver instanceof JavascriptExecutor) {
  String jswalker=
      "var tw = document.createTreeWalker("
     +   "arguments[0],"
     +   "NodeFilter.SHOW_TEXT,"
     +   "{ acceptNode: function(node) { return NodeFilter.FILTER_ACCEPT;} },"
     +    "false"
     + ");"
     + "var ret=null;"
     + "while(tw.nextNode()){"
     +   "var t=tw.currentNode.wholeText.trim();"
     +   "if(t.length>0){" // skip over all-white text values
     +      "ret=(ret ? ret+'\t'+t : t);" // if many, tab-separate them
     +   "}"
     + "}"
     + "return ret;" // will return null if no non-empty text nodes are found
  ;
  Object val=((JavascriptExecutor) driver).executeScript(jswalker, e);
  // ---- Pass the context node here ------------------------------^
  String textNodesTabSeparated=(null!=val ? val.toString() : null);
  // ----^ --- this is the result you want
}

References:

TreeWalker - supported by all browsers

Selenium Javascript Executor


Initial suggested solution - not working - see enhancement request: 2840

driver.findElement(By.id('one')).find(By.XPath("./text()").getText();

In a single search

driver.findElement(By.XPath("//[@id=one]/text()")).getText();

See XPath spec/Location Paths the child::text() selector.

I use a function like below:

private static final String ALL_DIRECT_TEXT_CONTENT =
        "var element = arguments[0], text = '';\n" +
                "for (var i = 0; i < element.childNodes.length; ++i) {\n" +
                "  var node = element.childNodes[i];\n" +
                "  if (node.nodeType == Node.TEXT_NODE" +
                " && node.textContent.trim() != '')\n" +
                "    text += node.textContent.trim();\n" +
                "}\n" +
                "return text;";

public String getText(WebDriver driver, WebElement element) {
    return (String) ((JavascriptExecutor) driver).executeScript(ALL_DIRECT_TEXT_CONTENT, element);
}

Similar solution to the ones given, but instead of JavaScript or setting text to "" , I remove elements in the XML and then get the text.

Problem:

Need text from 'root element without children' where children can be x levels deep and the text in the root can be the same as the text in other elements.

The solution treats the webelement as an XML and replaces the children with voids so only the root remains.

The result is then parsed. In my cases this seems to be working.

I only verified this code in a environment with Groovy. No idea if it will work in Java without modifications. Essentially you need to replace the groovy libraries for XML with Java libraries and off you go I guess.

As for the code itself, I have two parameters:

  • WebElement el
  • boolean strict

When strict is true, then really only the root is taken into account. If strict is false, then markup tags will be left. I included in this whitelist p, b, i, strong, em, mark, small, del, ins, sub, sup.

The logic is:

  1. Manage whitelisted tags
  2. Get element as string (XML)
  3. Parse to an XML object
  4. Set all child nodes to void
  5. Parse and get text

Up until now this seems to be working out.

You can find the code here: GitHub Code

var outerElement = driver.FindElement(By.XPath("a"));
var outerElementTextWithNoSubText = outerElement.Text.Replace(outerElement.FindElement(By.XPath("./*")).Text, "");

HTML

<div id='one'>
    <button id='two'>I am a button</button>
    <button id='three'>I am a button</button>
    I am a div
</div>

Code

driver.findElement(By.id('one')).getText();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM