使用JSOUP從HTML解析ID和名稱

Question

我需要提取以下html https://ndb.nal.usda.gov/ndb/search/list中列出的所有食物的ID和mfg.name

我正在使用Jsoup，並且還很新。

這是我必須提取食物的ID和名稱的html源，在這里輸入圖像描述

這是我在Java中的源代碼：

    try{
    Document doc = Jsoup.connect("https://ndb.nal.usda.gov/ndb/search/list?maxsteps=6&format=&count=&max=50&sort=fd_s&fgcd=&manu=&lfacet=&qlookup=&ds=&qt=&qp=&qa=&qn=&q=&ing=&offset=0&order=asc").userAgent("mozilla/17.0").get();
    Elements temp =doc.select ("div.list-left");

    int i=0;
    for ( Element Food:temp){
        i++;
        System.out.println(i+ "" +Food.getElementsByTag("table").first().text());
    }
    }
    catch (IOException e){
        e.printStackTrace();
    }

所以在這里，我從首頁獲得了所有信息。 但是我需要提取所有頁面的ID和mfg.names。

任何幫助將不勝感激。

Answer 1

嘗試這個。

try {
    int maxPage = 3681;
    int i = 0;
    for (int page = 0; page < maxPage; ++page) {
        Document doc = Jsoup.connect(
            "https://ndb.nal.usda.gov/ndb/search/list"
            + "?maxsteps=6&format=&count=&max=50"
            + "&sort=fd_s&fgcd=&manu=&lfacet=&qlookup=&ds="
            + "&qt=&qp=&qa=&qn=&q=&ing=&offset=" + (page * 50)
            + "&order=asc")
            .userAgent("mozilla/17.0").get();
        Elements rows = doc.select("div.list-left table tbody tr");
        for (Element row : rows) {
            ++i;
            System.out.print("No." + i);
            System.out.print(" ID=" + row.select("td:eq(1) a").text());
            System.out.println(" Manufacturer=" + row.select("td:eq(3) a").text());
        }
    }
} catch (IOException e) {
    e.printStackTrace();
}

使用JSOUP從HTML解析ID和名稱

問題描述

1 個解決方案

解決方案1
0 已采納 2017-03-17 02:20:20

使用JSOUP從HTML解析ID和名稱

問題描述

1 個解決方案

解決方案1 0 已采納 2017-03-17 02:20:20

解決方案1
0 已采納 2017-03-17 02:20:20