简体   繁体   中英

Getting the content of a table that being hidden by an onclick button javascript using JSoup

I am creating an web scraping for personal use in gaming. This is the website i am going to scrape: http://forum.toribash.com/clan_war.php?clanid=139

And i want to count the frequency of the name that appears on the "shows detail".

I have read this Get content from javascript onClick hyperlink without knowing that if this actually what i am searching for. I have a doubt that this is not what i am searching for, but regardless i have not try the answer of that questions since i have no idea on how to make this https://stackoverflow.com/a/12268561/10467473 fit to what i want.

        BufferedReader month = new BufferedReader(new InputStreamReader(System.in));
        String mth = month.readLine();
        //Accessing the website
        Document docs = Jsoup.connect("http://forum.toribash.com/clan_war.php?clanid=139").get();

        //Taking every entry of war history
        Elements collection = docs.getElementsByClass("war_history_entry");
        //Itterate every collection
        for(Element e : collection){
            //if the info is on the exact month that are being searched we will use the e
            if(e.getElementsByClass("war_info").text().split(" ")[1].equalsIgnoreCase(mth)){
                //supposedly it holds every element that has player as it class inside of the button onclick
                //But it doesn't work
                Elements cek = e.getElementsByClass("player");
                for(Element c : cek){
                    System.out.println(c.text());
                }
            }

For now i am expecting to get at least the name on show details table

Kaito
Chax
Draku

and so on

This page doesn't contain the information you want to scrape. Results are loaded by AJAX (Javascript) after the button is clicked. You can use your web browser's debugger to look on the Network tab to see what happens when you click the button. Clicking a button

<button id="buttonwarid19557"  ... >

loads a table from URL:

http://forum.toribash.com/clan_war_ajax.php?warid=19557&clanid=139

Notice the same id number.

What you have to do is to get the id from every button, then GET another document for each of these buttons and parse it one by one. That's what your web browser does anyway.

        BufferedReader month = new BufferedReader(new InputStreamReader(System.in));
        String mth = month.readLine();
        //Accessing the website
        Document docs = Jsoup.connect("http://forum.toribash.com/clan_war.php?clanid=139").get();

        //Taking every entry of war history
        Elements collection = docs.getElementsByClass("war_history_entry");
        //Itterate every collection
        for(Element e : collection){
            //if the info is on the exact month that are being searched we will use the e
            if(e.getElementsByClass("war_info").text().split(" ")[1].equalsIgnoreCase(mth)){
                // selecting button
                Element button = e.selectFirst("button");
                // getting warid from button id
                String buttonId = button.attr("id");
                // removing text because we need only number
                String warId = buttonId.replace("buttonwarid", "");

                System.out.println("downloading results for " + e.getElementsByClass("war_info").text());
                // downloading and parsing subpage containing table with info about single war
                // adding referrer to make the request look more like it comes from the real web browser to avoid possible hotlinking protection
                Document table = Jsoup.connect("http://forum.toribash.com/clan_war_ajax.php?warid=" + warId + "&clanid=139").referrer("http://forum.toribash.com/clan_war.php?clanid=139").get();
                // get every <td class="player"> ... </td>
                Elements players = table.select(".player");
                for(Element player : players){
                    System.out.println(player.text());
                }
            }
        }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM