簡體   English   中英

無法在Java中使用jsoup從html提取內容?

[英]unable to extract contents from html using jsoup in java?

我正在嘗試使用jsoup從以下<td>標記中的以下HTML代碼中提取內容,這些標記具有類css-sched-table-title和css-sched-waypoints。 但是我不明白有人出了什么問題可以幫忙嗎?

Document doc = Jsoup.parse("somelink.html");
    Elements row = doc.select(".css-sched-table-title td");
    Iterator<Element> iterator = row.listIterator();
    while(iterator.hasNext())
    {
       Element element = iterator.next();
        String value = element.text();
        System.out.println("value : " + value);
    }

  <tr>
        <td ALIGN="CENTER" COLSPAN="16"  CLASS="css-sched-table-title"><b>Saturday - </b><b>Afternoon</b></td>
    </tr>
    <tr VALIGN="BOTTOM">
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Townline and Southern</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and Blueridge</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Clearbrook and South Fraser</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Ar. Bourquin Exchange</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Lv. Bourquin Exchange</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Downtown Abbotsford</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">McMillan and Old Yale</TD>
        <TD>&nbsp;</TD>
        <TD ALIGN="CENTER" WIDTH="100" CLASS="css-sched-waypoints">Sandy Hill and Old Clayburn</TD>
    </tr>

只有一個帶有css-sched-table-title td標簽,但是帶有css-sched-waypoints的列表。

同樣,為了對齊正確的語法,它應該是Elements row = doc.select("td.css-sched-waypoints"); 請參考這里

注:使用的html文件是無效的,並且jsoup不會將其解釋為有效的表html內容。 我不得不將上面的內容放在<table></table>標記內。

當我用您的html文件嘗試以下代碼時:

Elements row = doc.select("td.css-sched-waypoints");
    Element title = doc.select("td.css-sched-table-title").first();

    System.out.println(title.text());
    Iterator<Element> iterator = row.listIterator();
    while (iterator.hasNext()) {
        Element element = iterator.next();
        String id = element.attr("id");
        String classes = element.attr("class");
        String value = element.text();
        System.out.println("Id : " + id + ", classes : " + classes
                + ", value : " + value);
    }

我明白了

Saturday - Afternoon
Id : , classes : css-sched-waypoints, value : Townline and Southern
Id : , classes : css-sched-waypoints, value : Clearbrook and Blueridge
Id : , classes : css-sched-waypoints, value : Clearbrook and South Fraser
Id : , classes : css-sched-waypoints, value : Ar. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Lv. Bourquin Exchange
Id : , classes : css-sched-waypoints, value : Downtown Abbotsford
Id : , classes : css-sched-waypoints, value : McMillan and Old Yale
Id : , classes : css-sched-waypoints, value : Sandy Hill and Old Clayburn

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM