[英]how to get message in Jsoup for a specific tag only in java?
I have such tag in my HTML: 我的HTML中有这样的标签:
<p class="outter">
<strong class="inner">not needed message</strong>
NEEDED MESSAGE
</p>
I'm trying to extract "NEEDED MESSAGE" 我正在尝试提取“需要的消息”
but if I do something like this: 但是如果我做这样的事情:
String results = document.select("p.outter").text();
System.out.println(results);
it prints : 它打印:
not needed messageNEEDED MESSAGE
不需要的消息需要的消息
So the question is: 所以问题是:
How can I get the text for a specific tag without the text from its inner tags? 如何获取特定标签的文本, 而没有其内部标签的文本?
One solution could be to select only the TextNode
elements. 一种解决方案是仅选择
TextNode
元素。 Find below a small snippet. 在下面找到一个小片段。
String html = "<p class=\"outter\">\n"
+ " <strong class=\"inner\">not needed message</strong>\n"
+ " NEEDED MESSAGE\n"
+ "</p>";
Document doc = Jsoup.parse(html);
Elements elements = doc.select("p.outter");
for (Element element : elements) {
// as mentioned by luksch
System.out.println("ownText = " + element.ownText());
// or manually based on the node type
for (Node node : element.childNodes()) {
if (node instanceof TextNode) {
System.out.println("node = " + node);
}
}
}
output 产量
node =
node = NEEDED MESSAGE
So you need to filter the output based on your requirement. 因此,您需要根据需要过滤输出。 Eg skip empty ones.
例如跳过空的。
You can use ownText()
after selecting the paragraph. 您可以在选择段落之后使用
ownText()
。 Example 例
package com.stackoverflow.answer;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import org.jsoup.nodes.Element;
public class HtmlParserExample {
public static void main(String[] args) {
String html = "<p class=\"outter\"><strong class=\"inner\">not needed message</strong>NEEDED MESSAGE</p>";
Document doc = Jsoup.parse(html);
Elements paragraphs = doc.select("p");
for (Element p : paragraphs)
System.out.println(p.ownText());
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.