简体   繁体   中英

How can I extract text content only from root element - java, com.gargoylesoftware.htmlunit.html

I can't find any way to extract text content only from the root element using com.gargoylesoftware.htmlunit.html . Here is some example:

<td>
  W 03:10 PM-04:25 PM
  <strong>
     <br>
     Hybrid (50%+ in-person)
  </strong>
</td>

I want to extract the text content from the root element("td" in this case), but it also extract the text content from the child element, which is the part that I don't want:

private void extractTextContent(HtmlElement htmlElement) {
    String content = htmlElement.getTextContent();
    System.out.println(content);
}

output:

W 03:10 PM-04:25 PMHybrid (50%+ in-person)

desired output:

W 03:10 PM-04:25 PM

I've tried to use other method call "asText()", however that doesn't give me desired output. I couldn't find any people who has same question using com.gargoylesoftware.htmlunit.html . Is there any way/method that would extract text content only from the root element?

EDIT: Thank you for the answer. I used same idea of deleting child node to get my desired output. Here is the syntax for java:

private void extractTextContent(HtmlElement htmlElement) {
    DomNode child = htmlElement.getLastElementChild();
    String tagname = "";
    if(child != null) {
        tagname = child.getTextContent();
        htmlElement.removeChild(tagname, 0);
    }
    String content = htmlElement.getTextContent();
}

You can try removing child nodes before fetching textContent.

private void extractTextContent(HtmlElement htmlElement) {
    DomNode child = htmlElement.getLastElementChild();
    String tagname = "";
    if(child != null) {
        tagname = child.getTextContent();
        htmlElement.removeChild(tagname, 0);
    }
    String content = htmlElement.getTextContent();
}

I have edited my answer with Java Syntax provided by @XYZ

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM