如何仅从根元素中提取文本内容 - java, com.gargoylesoftware.htmlunit.html

Question

I can't find any way to extract text content only from the root element using com.gargoylesoftware.htmlunit.html .我找不到任何使用com.gargoylesoftware.htmlunit.html 仅从根元素提取文本内容的方法。 Here is some example:下面是一些例子：

<td>
  W 03:10 PM-04:25 PM
  <strong>
     <br>
     Hybrid (50%+ in-person)
  </strong>
</td>

I want to extract the text content from the root element("td" in this case), but it also extract the text content from the child element, which is the part that I don't want:我想从根元素中提取文本内容（在这种情况下为“td”），但它也从子元素中提取文本内容，这是我不想要的部分：

private void extractTextContent(HtmlElement htmlElement) {
    String content = htmlElement.getTextContent();
    System.out.println(content);
}

output:输出：

W 03:10 PM-04:25 PMHybrid (50%+ in-person)

desired output:所需的输出：

W 03:10 PM-04:25 PM

I've tried to use other method call "asText()", however that doesn't give me desired output.我尝试使用其他方法调用“asText()”，但这并没有给我想要的输出。 I couldn't find any people who has same question using com.gargoylesoftware.htmlunit.html .我找不到任何使用com.gargoylesoftware.htmlunit.html有相同问题的人。 Is there any way/method that would extract text content only from the root element?有什么方法/方法可以仅从根元素中提取文本内容吗？

EDIT: Thank you for the answer.编辑：谢谢你的回答。 I used same idea of deleting child node to get my desired output.我使用相同的删除子节点的想法来获得我想要的输出。 Here is the syntax for java:这是java的语法：

private void extractTextContent(HtmlElement htmlElement) {
    DomNode child = htmlElement.getLastElementChild();
    String tagname = "";
    if(child != null) {
        tagname = child.getTextContent();
        htmlElement.removeChild(tagname, 0);
    }
    String content = htmlElement.getTextContent();
}

Answer 1

You can try removing child nodes before fetching textContent.您可以在获取 textContent 之前尝试删除子节点。

private void extractTextContent(HtmlElement htmlElement) {
    DomNode child = htmlElement.getLastElementChild();
    String tagname = "";
    if(child != null) {
        tagname = child.getTextContent();
        htmlElement.removeChild(tagname, 0);
    }
    String content = htmlElement.getTextContent();
}

I have edited my answer with Java Syntax provided by @XYZ我用@XYZ 提供的 Java 语法编辑了我的答案

如何仅从根元素中提取文本内容 - java, com.gargoylesoftware.htmlunit.html

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-03-26 07:52:14

如何仅从根元素中提取文本内容 - java, com.gargoylesoftware.htmlunit.html

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-03-26 07:52:14

解决方案1
1 已采纳 2020-03-26 07:52:14