Jsoup: get all heading tags

Question

I'm trying to parse an html document with Jsoup to get all heading tags. In addition I need to group the heading tags as [h1] [h2] etc...

     hh = doc.select("h[0-6]");

but this give me an empty array.

Answer 1

Your selector means h-Tag with attribute "0-6" here - not a regex. But you can combine multiple selectors instead: hh = doc.select("h0, h1, h2, h3, h4, h5, h6"); .

Grouping: do you need a group with all h-Tags + a group for each h1, h2, ... tag or only a group for each h1, h2, ... tag?

Here's an example how you can do this:

// Group of all h-Tags
Elements hTags = doc.select("h1, h2, h3, h4, h5, h6");

// Group of all h1-Tags
Elements h1Tags = hTags.select("h1");
// Group of all h2-Tags
Elements h2Tags = hTags.select("h2");
// ... etc.

If you want a group for each h1, h2, ... tag you can drop first selector and replace hTags with doc in the others.

Answer 2

Use doc.select("h1,h2,h3,h4,h5,h6") to get all heading tags. Use doc.select("h1") to get each of those tags separately. See the various things you can do with a select statement in http://preciselyconcise.com/apis_and_installations/jsoup/j_selector.php

Answer 3

Here is a Scala version of the answer that uses Ammonite's syntax to specify the Maven coordinates for Jsoup:

import $ivy.`org.jsoup:jsoup:1.11.3`
val html = scala.io.Source.fromURL("https://scalacourses.com").mkString
val doc = org.jsoup.Jsoup.parse(html)
doc.select("h1, h2, h3, h4, h5, h6, h7").eachText()

Jsoup: get all heading tags

Question

3 answers

solution1
21 ACCPTED 2012-10-21 14:10:57

solution2
2 2014-02-09 11:03:55

solution3
0 2019-05-07 08:55:53

Jsoup: get all heading tags

Question

3 answers

solution1 21 ACCPTED 2012-10-21 14:10:57

solution2 2 2014-02-09 11:03:55

solution3 0 2019-05-07 08:55:53

solution1
21 ACCPTED 2012-10-21 14:10:57

solution2
2 2014-02-09 11:03:55

solution3
0 2019-05-07 08:55:53