简体   繁体   English

使用 JSoup 解析两个不同标签之间的文本

[英]Using JSoup to parse text between two different tags

I have the following HTML...我有以下 HTML ...

<h3 class="number">
<span class="navigation">
6:55 <a href="/results/result.html" class="under"><b>&raquo;</b></a>
</span>**This is the text I need to parse!**</h3>

I can use the following code to extract the text from h3 tag.我可以使用以下代码从 h3 标签中提取文本。

Element h3 = doc.select("h3").get(0);

Unfortunately, that gives me everything in that tag.不幸的是,这给了我该标签中的所有内容。

6:55 &raquo; This is the text I need to parse!

Can I use Jsoup to parse between different tags?我可以使用 Jsoup 来解析不同的标签吗? Is there a best practice for doing this (regex?)是否有这样做的最佳实践(正则表达式?)

(regex?) (正则表达式?)

No, as you can read in the answers of this question , you can't parse HTML using a regular expression. 不,您可以阅读此问题的答案, 因此无法使用正则表达式解析HTML。

Try this: 尝试这个:

Element h3 = doc.select("h3").get(0);
String h3Text = h3.text();
String spanText = h3.select("span").get(0).text();
String textBetweenSpanEndAndH3End = h3Text.replace(spanText, "");

No, JSoup wasn't made for this. 不,JSoup不是为此而设计的。 It's supposed to parse something hierachical. 它应该解析层次结构。 Searching for a text which is between an end-tag and a start-tag, or the other way around wouldn't make any sense for JSoup. 搜索介于结束标记和开始标记之间的文本,或者相反,对于JSoup来说毫无意义。 That's what regular expressions are for. 这就是正则表达式的用途。

But you should of course narrow it down as much as you can using JSoup first, before you shoot with a regex at the string. 但是,在对字符串进行正则表达式拍摄之前,您当然应该首先使用JSoup将其范围尽可能缩小。

Just use ownText()只需使用 ownText()

   @Test
    void innerTextCase() {
        String sample = "<h3 class=\"number\">\n" +
                "<span class=\"navigation\">\n" +
                "6:55 <a href=\"/results/result.html\" class=\"under\"><b>&raquo;</b></a>\n" +
                "</span>**This is the text I need to parse!**</h3>\n";
        Assertions.assertEquals("**This is the text I need to parse!**", 
                Jsoup.parse(sample).select("h3").first().ownText());
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM