Jsoup無法從html表中提取格式

Question

<tr>

<th align="LEFT" bgcolor="GREY"> <span class="smallfont">Higher-order 
Theorems</span>

</th><th bgcolor="PINK"> <em><a href="\ 
[http://www.tptp.org/CASC/J9/SystemDescriptions.html#Satallax---3.2\] 
(http://www.tptp.org/CASC/J9/SystemDescriptions.html#Satallax-- 
-3.2)">Satallax</a><br><span class="xxsmallfont">3.2</span></em>

</th><th bgcolor="SKYBLUE"> <a href="\ 
[http://www.tptp.org/CASC/J9/SystemDescriptions.html#Satallax---3.3\] 
(http://www.tptp.org/CASC/J9/SystemDescriptions.html#Satallax-- 
-3.3)">Satallax</a><br><span class="xxsmallfont">3.3</span>

</th><th bgcolor="LIME"> <a href="\ 
[http://www.tptp.org/CASC/J9/SystemDescriptions.html#Leo-III---1.3\] 
(http://www.tptp.org/CASC/J9/SystemDescriptions.html#Leo-III-- 
-1.3)">Leo‑III</a><br><span class="xxsmallfont">1.3</span>

</th><th bgcolor="YELLOW"> <a href="\ 
[http://www.tptp.org/CASC/J9/SystemDescriptions.html#LEO-II---1.7.0\] 
(http://www.tptp.org/CASC/J9/SystemDescriptions.html#LEO-II-- 
-1.7.0)">LEO‑II</a><br><span class="xxsmallfont">1.7.0</span>

</th></tr>

因此，可以說我想提取bgcolor，align和span類中包含的內容。 因此，例如GREY，LEFT，高階定理。

如果我只想提取至少bgcolor，但理想情況下全部提取3，我該怎么做？

所以我試圖只提取bgcolor和

我已經嘗試了doc.select（“ tr：contains（[bgcolor]”），doc.select（th，[bgcolor]，doc.select（[bgcolor]），doc.select（tr：containsdata（bgcolor）），以及doc.select（[style]）都沒有返回任何輸出或返回了解析錯誤。我可以很好地提取span類中的內容，但它還具有提取bgcolor和align的問題。

Answer 1

您只需要使用JSOUP Elements中的attr選擇器將要剪貼的HTML代碼解析為JSOUP，然后選擇所需的HTML標簽的屬性，即可為HTML中的每個標簽提供該屬性的值。 要同時檢索span標簽之間包含的文本，您需要選擇th中的嵌套span並獲取.text（） 。

    Document document = Jsoup.parse(YOUT HTML GOES HERE);
    System.out.println(document);
    Elements elements = document.select("tr > th");

    for (Element element : elements) {
        String align = element.attr("align");
        String color = element.attr("bgcolor");
        String spanText = element.select("span").text();

        System.out.println("Align is " + align +
                "\nBackground Color is " + color +
                "\nSpan Text is " + spanText);
    }

有關更多信息，請隨時問我！ 希望這對您有所幫助！

更新了答案以評論：

為此，您需要在每個循環的內部使用此行：

String fullText = element.text();

這樣，您可以獲取所選Element標記之間包含的所有文本，但是您應該查找此博客並使其適合您的查詢。 我猜您還需要檢查String是否為空，並使用IF條件對每種可能的情況分別進行查詢。

這意味着對於該結構具有一個： tr> th> span ，對於該結構具有另一個： tr> th> em ，另一個對於： tr> th 。

Jsoup無法從html表中提取格式

問題描述

1 個解決方案

解決方案1
0 已采納 2018-11-26 11:04:14

Jsoup無法從html表中提取格式

問題描述

1 個解決方案

解決方案1 0 已采納 2018-11-26 11:04:14

解決方案1
0 已采納 2018-11-26 11:04:14