如何在Jsoup中仅在Java中获取特定标签的消息？

Question

I have such tag in my HTML: 我的HTML中有这样的标签：

<p class="outter">
  <strong class="inner">not needed message</strong>
  NEEDED MESSAGE
</p>

I'm trying to extract "NEEDED MESSAGE" 我正在尝试提取“需要的消息”

but if I do something like this: 但是如果我做这样的事情：

String results = document.select("p.outter").text();
System.out.println(results);

it prints : 它打印：

not needed messageNEEDED MESSAGE 不需要的消息需要的消息

So the question is: 所以问题是：

How can I get the text for a specific tag without the text from its inner tags? 如何获取特定标签的文本， 而没有其内部标签的文本？

Answer 1

One solution could be to select only the TextNode elements. 一种解决方案是仅选择TextNode元素。 Find below a small snippet. 在下面找到一个小片段。

String html = "<p class=\"outter\">\n"
        + "  <strong class=\"inner\">not needed message</strong>\n"
        + "  NEEDED MESSAGE\n"
        + "</p>";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("p.outter");
for (Element element : elements) {
    // as mentioned by luksch
    System.out.println("ownText = " + element.ownText());

    // or manually based on the node type
    for (Node node : element.childNodes()) {
        if (node instanceof TextNode) {
            System.out.println("node = " + node);
        }
    }
}

output 产量

node =  
node =  NEEDED MESSAGE

So you need to filter the output based on your requirement. 因此，您需要根据需要过滤输出。 Eg skip empty ones. 例如跳过空的。

Answer 2

You can use ownText() after selecting the paragraph. 您可以在选择段落之后使用ownText() 。 Example 例

package com.stackoverflow.answer;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import org.jsoup.nodes.Element;

public class HtmlParserExample {

    public static void main(String[] args) {
        String html = "<p class=\"outter\"><strong class=\"inner\">not needed message</strong>NEEDED MESSAGE</p>";
        Document doc = Jsoup.parse(html);
        Elements paragraphs = doc.select("p");
        for (Element p : paragraphs)
            System.out.println(p.ownText());
    }

}

Answer 3

Use Jsoup's ownText () method: 使用Jsoup的ownText （）方法：

String results = document.select("p.outter").ownText();
System.out.println(results);

如何在Jsoup中仅在Java中获取特定标签的消息？

问题描述

3 个解决方案

解决方案1
1 2015-10-19 12:28:01

解决方案2
1 已采纳 2015-10-19 12:30:07

解决方案3
1 2015-10-19 12:31:12

如何在Jsoup中仅在Java中获取特定标签的消息？

问题描述

3 个解决方案

解决方案1 1 2015-10-19 12:28:01

解决方案2 1 已采纳 2015-10-19 12:30:07

解决方案3 1 2015-10-19 12:31:12

解决方案1
1 2015-10-19 12:28:01

解决方案2
1 已采纳 2015-10-19 12:30:07

解决方案3
1 2015-10-19 12:31:12