Jsoup：獲取所有標題標簽

Question

我正在嘗試使用Jsoup解析html文檔以獲取所有標題標記。 另外我需要將標題標簽分組為[h1] [h2]等...

     hh = doc.select("h[0-6]");

但這給了我一個空陣列。

Answer 1

您的選擇器意味着h-Tag在此處具有屬性“0-6” - 而不是正則表達式。 但是你可以組合多個選擇器： hh = doc.select("h0, h1, h2, h3, h4, h5, h6"); 。

分組：您是否需要一個包含所有h-Tags +組的組，每個h1，h2，...標簽或每個h1，h2，...標簽只有一個組？

以下是如何執行此操作的示例：

// Group of all h-Tags
Elements hTags = doc.select("h1, h2, h3, h4, h5, h6");

// Group of all h1-Tags
Elements h1Tags = hTags.select("h1");
// Group of all h2-Tags
Elements h2Tags = hTags.select("h2");
// ... etc.

如果你想為每個h1，h2，...標簽建立一個組，你可以刪除第一個選擇器並用其他人的doc替換hTags 。

Answer 2

使用doc.select（“h1，h2，h3，h4，h5，h6”）獲取所有標題標記。 使用doc.select（“h1”）分別獲取每個標記。 在http://preciselyconcise.com/apis_and_installations/jsoup/j_selector.php中查看使用select語句可以執行的各種操作。

Answer 3

這是一個Scala版本的答案，它使用Ammonite的語法來指定Jsoup的Maven坐標：

import $ivy.`org.jsoup:jsoup:1.11.3`
val html = scala.io.Source.fromURL("https://scalacourses.com").mkString
val doc = org.jsoup.Jsoup.parse(html)
doc.select("h1, h2, h3, h4, h5, h6, h7").eachText()

Jsoup：獲取所有標題標簽

問題描述

3 個解決方案

解決方案1
21 已采納 2012-10-21 14:10:57

解決方案2
2 2014-02-09 11:03:55

解決方案3
0 2019-05-07 08:55:53

Jsoup：獲取所有標題標簽

問題描述

3 個解決方案

解決方案1 21 已采納 2012-10-21 14:10:57

解決方案2 2 2014-02-09 11:03:55

解決方案3 0 2019-05-07 08:55:53

解決方案1
21 已采納 2012-10-21 14:10:57

解決方案2
2 2014-02-09 11:03:55

解決方案3
0 2019-05-07 08:55:53