简体   繁体   中英

Select a particular HTML table with JSOUP

I have my code as:

public static void main(String[] args) throws IOException {

    org.jsoup.nodes.Document doc = Jsoup.connect("https://ms.wikipedia.org/wiki/Malaysia").get();
    org.jsoup.select.Elements rows = doc.select("tr");
    for (org.jsoup.nodes.Element row : rows) {
        org.jsoup.select.Elements columns = row.select("td");
        for (org.jsoup.nodes.Element column : columns) {
            System.out.print(column.text());
        }
        System.out.println();
    }

}

It is printing out all the table rows that on the webpage, is it possible if I just want to print out a selected table in the website?

Try to select a particular table element first and then loop over its nested elements.

public static void main(String[] args) throws IOException {
    Document doc = Jsoup.connect("https://ms.wikipedia.org/wiki/Malaysia").get();
    Element table = doc.select("table.wikitable").get(1);
    Elements body = table.select("tbody");
    Elements rows = body.select("tr");
    for (Element row : rows) {
        System.out.print(row.select("th").text());
        System.out.print(row.select("td").text());
        System.out.println();
    }
}

Output:

Ibu negaraKuala Lumpur
Pusat pentadbiranPutrajaya
Tarikh Hari Kebangsaan31 Ogos 1957
Cogan Kata NegaraBersekutu Bertambah Mutu
BenuaAsia, Asia Tenggara
Koordinat Geografi2 30 U, 112 30 T
Jumlah hujan tahunan2000mm ~ 2500mm
IklimTropika dengan suhu 24–35 Darjah Celsius
Bunga kebangsaanBunga Raya
Binatang rasmiHarimau
Puncak tertinggiGunung Kinabalu, Banjaran Crocker (4175m)
Puncak tertinggi SemenanjungGunung Tahan, Banjaran Tahan (2187 m)
Banjaran terpanjangBanjaran Titiwangsa (500 km)
Sungai terpanjangSungai Rajang, Sarawak (563 km)
Sungai terpanjang di SemenanjungSungai Pahang (475 km)
Jambatan terpanjangJambatan Pulau Pinang (13.5 km)
Gua terbesarGua Niah, Sarawak
Bangunan tertinggiMenara Berkembar Petronas (452m)
Negeri terbesarSarawak (124,450 km persegi)
Negeri terkecilPerlis (810 km persegi)
Tempat paling lembapBukit Larut (lebih 5080 mm)
Tempat paling keringJelebu (kurang daripada 1500 mm)
Kawasan paling padatKuala Lumpur (6074/km², 15,543/batu persegi)
Penanaman eksport utamaKelapa sawit dan getah

Read more documentation here about JSOUP.

The best way to do this is grab the table by its title. Since the title is embedded in a cousin element of the table, and CSS has no parent selector, you can use a combination of CSS and Jsoup API calls to achieve this.

public static void main(String[] args) throws IOException {
    Document doc = Jsoup.connect("https://ms.wikipedia.org/wiki/Malaysia").get();
    Element table = doc.select("span#Trivia").parents().first().nextElementSibling();
    Elements rows = table.select("tr");
    for (Element row : rows) {
        String header = row.select("th").text();
        String value = row.select("td").text();
        System.out.println(header + ": " + value);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM