简体   繁体   中英

Retrieving data from nested href from jsoup

I would like retrieving data from nested href from jsoup, i mean: i have href: https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999

and i want to take each data from this 10 fighers, eg:

1. STIPE MIOCIC AGE: 37 or ASSOCIATION: STRONG STYLE FIGHT TEAM

2. DANIEL CORMIER AGE: 40 or ASSOCIATION: AMERICAN KICKBOXING ACADEMY

etc..

How to do this?

    String url = "https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999";
    Document document = Jsoup.connect(url).get();

    Elements allH1 = document.select("h2");
    for (Element href : allH1) {

        Elements allAge = document.select("div.birth_info");
        for (Element  age : allAge) {
            System.out.println(href.select("a[href]").text().toString());
            System.out.println(age.select() // something there?);
        }

The data you are looking for is present on seperate pages - each fighter has his own page, so you must crawl all the pages one by one to get the data.
First you have to get the link for each page, with the selector h2 > a[href] :

String url = "https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999";
Document document = Jsoup.connect(url).get();
Elements fighters = document.select("h2 > a[href]");
for (Element fighter : fighters) {
     System.out.println(fighter.text() + " " + fighter.attr("href"));
}

After that, you can load each page and extract the data:

String fighterUrl = "https://www.sherdog.com" + fighter.attr("href"); 
Document doc = Jsoup.connect(fighterUrl).get();
Element fighterData = doc.select("div.data").first();
System.out.println(fighterData.text());

Combined together, you get:

String url = "https://www.sherdog.com/news/rankings/2/Sherdogs-Official-Mixed-Martial-Arts-Rankings-164999";
Document document = Jsoup.connect(url).get();
Elements fighters = document.select("h2 > a[href]");
for (Element fighter : fighters) {
    System.out.println(fighter.text());
    String fighterUrl = "https://www.sherdog.com" + fighter.attr("href"); 
    Document doc = Jsoup.connect(fighterUrl).get();
    Element fighterData = doc.select("div.data").first();
    System.out.println(fighterData.text());
    System.out.println("---------------");
}

And the (partial) output is:

Stipe Miocic Born: 1982-08-19 AGE: 37 Independence, Ohio United States Height 6'4" 193.04 cm Weight 245 lbs 111.13 kg Association: Strong Style Fight Team Class: Heavyweight Wins 19 15 KO/TKO (79%) 0 SUBMISSIONS (0%) 4 DECISIONS (21%) Losses 3 2 KO/TKO (67%) 0 SUBMISSIONS (0%) 1 DECISIONS (33%)

Daniel Cormier Born: 1979-03-20 AGE: 40 San Jose, California United States Height 5'11" 180.34 cm Weight 251 lbs 113.85 kg Association: American Kickboxing Academy Class: Heavyweight Wins 22 10 KO/TKO (45%) 5 SUBMISSIONS (23%) 7 DECISIONS (32%) Losses 2 1 KO/TKO (50%) 0 SUBMISSIONS (0%) 1 DECISIONS (50%) N/C 1

If you want to get the age, association and so as seperate fields, you'll have to extract them with regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM