Using JSoup to parse text between two different tags

Question

I have the following HTML...

<h3 class="number">
<span class="navigation">
6:55 <a href="/results/result.html" class="under"><b>&raquo;</b></a>
</span>**This is the text I need to parse!**</h3>

I can use the following code to extract the text from h3 tag.

Element h3 = doc.select("h3").get(0);

Unfortunately, that gives me everything in that tag.

6:55 &raquo; This is the text I need to parse!

Can I use Jsoup to parse between different tags? Is there a best practice for doing this (regex?)

Answer 1

(regex?)

No, as you can read in the answers of this question , you can't parse HTML using a regular expression.

Try this:

Element h3 = doc.select("h3").get(0);
String h3Text = h3.text();
String spanText = h3.select("span").get(0).text();
String textBetweenSpanEndAndH3End = h3Text.replace(spanText, "");

Answer 2

No, JSoup wasn't made for this. It's supposed to parse something hierachical. Searching for a text which is between an end-tag and a start-tag, or the other way around wouldn't make any sense for JSoup. That's what regular expressions are for.

But you should of course narrow it down as much as you can using JSoup first, before you shoot with a regex at the string.

Answer 3

Just use ownText()

   @Test
    void innerTextCase() {
        String sample = "<h3 class=\"number\">\n" +
                "<span class=\"navigation\">\n" +
                "6:55 <a href=\"/results/result.html\" class=\"under\"><b>&raquo;</b></a>\n" +
                "</span>**This is the text I need to parse!**</h3>\n";
        Assertions.assertEquals("**This is the text I need to parse!**", 
                Jsoup.parse(sample).select("h3").first().ownText());
    }

Using JSoup to parse text between two different tags

Question

3 answers

solution1
3 ACCPTED 2013-08-19 16:52:53

solution2
0 2013-08-19 16:53:20

solution3
0 2022-12-16 04:41:06

Using JSoup to parse text between two different tags

Question

3 answers

solution1 3 ACCPTED 2013-08-19 16:52:53

solution2 0 2013-08-19 16:53:20

solution3 0 2022-12-16 04:41:06

solution1
3 ACCPTED 2013-08-19 16:52:53

solution2
0 2013-08-19 16:53:20

solution3
0 2022-12-16 04:41:06