简体   繁体   English

如何在Jsoup中仅在Java中获取特定标签的消息?

[英]how to get message in Jsoup for a specific tag only in java?

I have such tag in my HTML: 我的HTML中有这样的标签:

<p class="outter">
  <strong class="inner">not needed message</strong>
  NEEDED MESSAGE
</p>

I'm trying to extract "NEEDED MESSAGE" 我正在尝试提取“需要的消息”

but if I do something like this: 但是如果我做这样的事情:

String results = document.select("p.outter").text();
System.out.println(results);

it prints : 它打印:

not needed messageNEEDED MESSAGE 不需要的消息需要的消息

So the question is: 所以问题是:

How can I get the text for a specific tag without the text from its inner tags? 如何获取特定标签的文本, 而没有其内部标签的文本?

One solution could be to select only the TextNode elements. 一种解决方案是仅选择TextNode元素。 Find below a small snippet. 在下面找到一个小片段。

String html = "<p class=\"outter\">\n"
        + "  <strong class=\"inner\">not needed message</strong>\n"
        + "  NEEDED MESSAGE\n"
        + "</p>";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("p.outter");
for (Element element : elements) {
    // as mentioned by luksch
    System.out.println("ownText = " + element.ownText());

    // or manually based on the node type
    for (Node node : element.childNodes()) {
        if (node instanceof TextNode) {
            System.out.println("node = " + node);
        }
    }
}

output 产量

node =  
node =  NEEDED MESSAGE 

So you need to filter the output based on your requirement. 因此,您需要根据需要过滤输出。 Eg skip empty ones. 例如跳过空的。

You can use ownText() after selecting the paragraph. 您可以在选择段落之后使用ownText() Example

package com.stackoverflow.answer;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import org.jsoup.nodes.Element;

public class HtmlParserExample {

    public static void main(String[] args) {
        String html = "<p class=\"outter\"><strong class=\"inner\">not needed message</strong>NEEDED MESSAGE</p>";
        Document doc = Jsoup.parse(html);
        Elements paragraphs = doc.select("p");
        for (Element p : paragraphs)
            System.out.println(p.ownText());
    }

}

Use Jsoup's ownText () method: 使用Jsoup的ownText ()方法:

String results = document.select("p.outter").ownText();
System.out.println(results);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM