简体   繁体   English

我需要从网站上的动态表中提取数据,我想使用Jsoup(Java)

[英]I need to extract data from a dynamic table on a website, I want to use Jsoup (Java)

I'm using Jsoup to extract data from a table on a website. 我正在使用Jsoup从网站上的表中提取数据。 The content of this table is dynamic, there is a refresh button that updates the rows when I click on it. 该表的内容是动态的,当我单击它时,有一个刷新按钮可更新行。

I tried to extrapolate the data with Jsoup, but it's like I cannot see the rows of the table when I analyze the HTML page. 我试图用Jsoup推断数据,但是就像我在分析HTML页面时看不到表的行。 (parse method). (解析方法)。 When I click on the refresh button, a JavaScript function is called. 当我单击刷新按钮时,将调用JavaScript函数。

Do you have any suggestions? 你有什么建议吗? I read that Jsoup is not able to extract dynamic values from an HTML page? 我读到Jsoup无法从HTML页面提取动态值? true? 真正? Do I have to use some other library? 我是否必须使用其他图书馆? Here is the HTML code of the page extracted from the Chrome DevTools (Element section of the tool) 这是从Chrome DevTools中提取的页面的HTML代码(该工具的“元素”部分)

This is what I see when I go to the Chrome DevTools in the "Element" section: 这是我在“元素”部分中转到Chrome DevTools时看到的内容:

要提取的数据

I want extract the "yellow data". 我要提取“黄色数据”。

This is what I see when I go to the chrome developer tool in the "Source" section: 这是我在“来源”部分转到chrome开发人员工具时看到的内容:

<div id="datatable_wrapper" class="dataTables_wrapper form-inline dt-bootstrap no-footer" style="display:none;margin-bottom: 50px;min-height:585px">
  <div class="row" style="margin:0px:padding:0px">
    <div class="col-sm-12">
      <table id="datatable" class="table table-striped table-bordered table-hover dataTable no-footer" style="width: 1100px;margin:0px !important;padding:0px !important;" role="grid" aria-describedby="datatable_info">
        <thead>
          <tr role="row">
            <th class="sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Event Date/Time: activate to sort column ascending" style="width: 100px;">TITLE_COLUMN1</th>
            <th class="hCenter sorting_disabled" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" style="width: 40px;"></th>
            <th class="sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Event Name: activate to sort column ascending" style="width: 250px;">TITLE_COLUMN2</th>
            <th class="sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Bet: activate to sort column ascending" style="width: 100px;">TITLE_COLUMN3</th>
            <th class="hCenter sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Rating: activate to sort column" style="width: 80px;">TITLE_COLUMN4(%)</th>
            <th class="hCenter sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="SNR Rating: activate to sort column" style="width: 95px;">TITLE_COLUMN5(%)</th>
            <th class="hCenter sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Bookie: activate to sort column ascending" style="width: 100px;">TITLE_COLUMN6</th>
            <th class="hCenter sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Back Odds: activate to sort column ascending" style="width:40px;">TITLE_COLUMN7</th>
            <th class="hCenter" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Exchange: activate to sort column ascending" style="width: 100px;">TITLE_COLUMN8</th>
            <th class="hCenter sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Lay Odds: activate to sort column ascending" style="width: 40px;">TITLE_COLUMN9</th>
            <th class="sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Availibity: activate to sort column ascending" style="width: 50px;">TITLE_COLUMN10</th>
            <th class="sorting" tabindex="0" aria-controls="datatable" rowspan="1" colspan="1" aria-label="Availibity: activate to sort column ascending" style="width: 50px;">
              <img src="/clock.png" width="15px" style="margin-left:10px;" />
            </th>
          </tr>
        </thead>
        <tbody>
        </tbody>
      </table>
    </div>
  </div>

This is the javascript function that popolate the table: 这是填充表格的javascript函数:

function getData(ratingFrom,ratingTo,oddsFrom,oddsTo,availability,sortColumn=5,sortDirection="desc",offset=0,bookies="",filterbookies="c81e728d9d4c2f636f067f89cc14862c",eventname="",dateFrom="",dateTo="",exchange="",exchanges="",sport="all"){

$("#datatable_processing").css("display","block");
$("#datatable_nodata").css("display","none");
var importo_puntata = parseFloat($("#settings-form input[name=importo-puntata]").val());
var importo_bonus_rimborso = parseFloat($("#settings-form input[name=importo-bonus-rimborso]").val());

$.post("/get_data.php", {"refund":importo_bonus_rimborso,"back_stake":importo_puntata,"name":eventname,"filterbookies":filterbookies,"bookies":bookies,"rating-from":ratingFrom,"rating-to":ratingTo,"odds-from":oddsFrom,"odds-to":oddsTo,"min-liquidity":availability,"sort-column":sortColumn,"sort-direction":sortDirection,"offset":offset,"date-from":dateFrom,"date-to":dateTo,"exchange":exchange,"exchanges":exchanges,'sport':sport}, function(data){
    var allData = jQuery.parseJSON(data);
    var paginationHtml = "<li class=\"paginate_button previous disabled\" aria-controls=\"datatable\" tabindex=\"0\" id=\"datatable_previous\"><a href=\"#\">Precedente</a></li>";
    paginationHtml+= "<li class=\"paginate_button next\" aria-controls=\"datatable\" tabindex=\"0\" id=\"datatable_next\"><a href=\"#\">Seguente</a></li>";
    $("ul.pagination").html(paginationHtml);

    if(allData.data.length>0)
    {    
        if(allData.bookmakers.length>0)
        {
            if(allData.bookmakers.length>1)
            {
                var html =  '<option value="all">Tutti i Bookmakers</option>';
                $.each(allData.bookmakers, function(i, item) {
                    if(filterbookies == item.id.toString())
                        html  += '<option value="'+item.id.toString()+'" selected="selected">'+capitalizeFirstLetter(item.name)+'</option>';
                    else    
                        html  += '<option value="'+item.id.toString()+'">'+capitalizeFirstLetter(item.name)+'</option>';
                });

                $("#bookmaker").css("display","inline-block");
                $("#bookmaker").html(html);
            }
            else if(allData.bookmakers.length==1)
            {
                var html = '<option value="'+allData.bookmakers[0].id.toString()+'" selected="selected">'+capitalizeFirstLetter(allData.bookmakers[0].name)+'</option>';
                $("#bookmaker").html(html);
            }
        }

        if(allData.exchanges.length>1 && allData.bookmakers.length >1)
        {
                var html =  '<option value="all">Tutti gli Exchanges</option>';
                $.each(allData.exchanges, function(i, item) {
                        html  += '<option value="'+item.toString()+'">'+capitalizeFirstLetter(item.toString())+'</option>';
                });

                $("#exchange").css("display","inline-block");
                $("#exchange").html(html);
        }
        else if ($("input[name=exchanges]").val() == "all")
        {
                $("#exchange").css("display","inline-block");
        }


        $("#datatable tbody").html("");
        var html = "";



        $.each(allData.data, function(i, item) {
            var json = allData.data[i];
            var redRating = "";
            if(json.rating>=100)
                redRating= " redrating ";

            html  +=  "<tr role=\"row\" style=\"background-color: #fff !important;\" back-odds=\""+json.back_odds+"\" lay-odds=\""+json.lay_odds+"\" competition=\""+json.competition+"\" country=\""+json.country_code+"\" exchange=\""+json.exchange+"\" >"+
                       "<td>"+json.opendate+"</td>"+
                       "<td class=\"hCenter\"><img src=\"/images/"+json.sport+".png\" /></td>"+
                       "<td>"+json.event_name+"</td>"+
                       "<td>"+json.bet+"</td>"+
                       "<td class=\"hCenter sorting_1\"><span class=\"rating"+redRating+"\">"+json.rating+"</span></td>"+
                       "<td class=\"hCenter sorting_1\"><span class=\"snrrating\">"+json.snr_rating.toString()+".00"+"</span>"+
                       "<img src=\"/images/calculator.png\" class=\"calculator\""+
                       "  attrib-sport=\""+json.sport+"\" attrib-exchange=\""+json.exchange+"\" attrib-competition=\""+json.competition+"\" attrib-country=\""+json.country_code+"\" attrib-eventdate=\""+json.opendate+"\" attrib-eventname=\""+json.event_name+"\" attrib-bet=\""+json.bet+"\" attrib-market=\""+json.market_type+"\" attrib-rating=\""+json.rating+"\" attrib-odds-provider_id=\""+json.odds_provider_fk+"\"  attrib-odds-provider=\""+capitalizeFirstLetter(json.odds_provider)+"\" attrib-back-odds=\""+json.back_odds+"\" attrib-lay-odds=\""+json.lay_odds+"\" attrib-availability=\""+json.availability+"\" attrib-bookie-bet-url=\""+json.bookie_bet_url+"\" attrib-betfair-bet-url=\""+json.betfair_bet_url+"\" "+
                       " /></td>"+
                       "<td class=\" hCenter\"><img src=\"/images/"+json.odds_provider_fk+".png\" width=\"80\" /></td>"+
                       "<td class=\" hCenter back\" ><span>"+json.back_odds+"</span></td>"+
                       "<td class=\" hCenter\" ><img src=\"/images/"+json.exchange+".png\" width=\"80\" /></td>"+
                       "<td class=\" hCenter lay\" ><span>"+json.lay_odds+"</span></td>"+
                       "<td>&nbsp; &#8364;"+json.availability+"</td>"+
                       "<td>"+json.update_time.toString()+"</td>"+
                    "</tr>";
        });



        var allEvents = parseInt(allData.allEventsCount);
        if(allEvents>10)
        {    
            allEvents = allEvents - 10;
            var j=0;
            var pageStart = parseInt(allData.offset);
            if(pageStart<9)
                pageStart=1;
                else
                pageStart-=4;

            var pageHtml = "";
            for(var i=pageStart;i<=parseInt(allEvents/10);i++)
            {
                var current = "";
                if((i-1==parseInt(allData.offset) && pageStart!=1) || (pageStart==1 && i-1==allData.offset))
                    current = "paginate_current disabled";

                pageHtml += "<li class=\"paginate_button paginate "+current+"\" aria-controls=\"datatable\" tabindex=\"0\"><a href=\"\" class=\"paginate\">"+i.toString()+"</a></li>";

                j++;
                if(j>9)
                    break;
            }

            $("#datatable_previous").after(pageHtml);
        }

        $("#datatable_previous").click(function(event){
            if($(this).hasClass("disabled"))
                {event.preventDefault();return;}
            var sortColumn = $("#search-form input[name=sort-column]").val();
            var sortDirection = $("#search-form input[name=sort-direction]").val();

            var ratingFrom = $("#search-form input[name=rating-from]").val();
            var ratingTo=$("#search-form input[name=rating-to]").val();
            var oddsFrom=$("#search-form input[name=odds-from]").val();
            var oddsTo=$("#search-form input[name=odds-to]").val();
            var availability=$("#search-form input[name=availability]").val();
            var offset=parseInt($("#search-form input[name=offset]").val())-1;
            var bookies = $("#search-form input[name=bookies]").val();
            var filterbookies = $("#bookmaker").val();
             var dateFrom = $("#date-from").val();
                    var dateTo = $("#date-to").val();
            var teamname = $("#event-name").val();
             var exchange = $("#exchange").val();
             var exchanges = $("#search-form input[name=exchanges]").val();
            var sport = $("#sport").val();
                 getData(ratingFrom,ratingTo,oddsFrom,oddsTo,availability,sortColumn,sortDirection,offset,bookies,filterbookies,teamname,dateFrom,dateTo,exchange,exchanges,sport);

        });

        $("#datatable_next").click(function(event){
            if($(this).hasClass("disabled"))
                {event.preventDefault();return;}

            var sortColumn = $("#search-form input[name=sort-column]").val();
            var sortDirection = $("#search-form input[name=sort-direction]").val();

            var ratingFrom = $("#search-form input[name=rating-from]").val();
            var ratingTo=$("#search-form input[name=rating-to]").val();
            var oddsFrom=$("#search-form input[name=odds-from]").val();
            var oddsTo=$("#search-form input[name=odds-to]").val();
            var availability=$("#search-form input[name=availability]").val();
            var offset=parseInt($("#search-form input[name=offset]").val())+1;
            var bookies = $("#search-form input[name=bookies]").val();
            var filterbookies = $("#bookmaker").val();
            var dateFrom = $("#date-from").val();
                    var dateTo = $("#date-to").val();
            var teamname = $("#event-name").val();
             var exchange = $("#exchange").val();
             var exchanges = $("#search-form input[name=exchanges]").val();
            var sport = $("#sport").val();
                 getData(ratingFrom,ratingTo,oddsFrom,oddsTo,availability,sortColumn,sortDirection,offset,bookies,filterbookies,teamname,dateFrom,dateTo,exchange,exchanges,sport);

        });

        $("ul.pagination>li.paginate>a.paginate").click(function(event){
            event.preventDefault();
            var offset = parseInt($(this).html())-1;
            var sortColumn = $("#search-form input[name=sort-column]").val();
            var sortDirection = $("#search-form input[name=sort-direction]").val();

            var ratingFrom = $("#search-form input[name=rating-from]").val();
            var ratingTo=$("#search-form input[name=rating-to]").val();
            var oddsFrom=$("#search-form input[name=odds-from]").val();
            var oddsTo=$("#search-form input[name=odds-to]").val();
            var availability=$("#search-form input[name=availability]").val();
            var bookies = $("#search-form input[name=bookies]").val();
            var filterbookies = $("#bookmaker").val();
            var teamname = $("#event-name").val();
             var dateFrom = $("#date-from").val();
                    var dateTo = $("#date-to").val();
             var exchange = $("#exchange").val();
             var exchanges = $("#search-form input[name=exchanges]").val();
            var sport = $("#sport").val();
                 getData(ratingFrom,ratingTo,oddsFrom,oddsTo,availability,sortColumn,sortDirection,offset,bookies,filterbookies,teamname,dateFrom,dateTo,exchange,exchanges,sport);
        });

        $("#datatable_wrapper .pageNumber").html((parseInt(allData.offset)*10)+1);
        $("#datatable_wrapper .pageCount").html(allData.data.length+(parseInt(allData.offset)*10));
        $("#datatable_wrapper .allEventsCount").html(allData.allEventsCount);

        if((allData.data.length+(parseInt(allData.offset)*10)) < allData.allEventsCount)
            $("#datatable_next").removeClass("disabled");
        else
        {
            $("#datatable_next").removeClass("disabled");
            $("#datatable_next").addClass("disabled");
        }

        if(parseInt(allData.offset)>=1)
            $("#datatable_previous").removeClass("disabled");
        else
        {
            $("#datatable_previous").removeClass("disabled");
            $("#datatable_previous").addClass("disabled");
        }

        $("#datatable tbody").html(html);
        $("#datatable_wrapper").css("display","block");
        $("#datatable th").removeClass("sorting_desc");
        $("#datatable th").removeClass("sorting_asc");
        $("#datatable th").removeClass("sorting");
        $("#datatable th").addClass("sorting");
        $("#datatable th").removeAttr("aria-sort");

        $("#search-form input[name=offset]").val(allData.offset);
        $("#search-form input[name=sort-column]").val(allData.sortColumn);
        $("#search-form input[name=sort-direction]").val(allData.sortDirection);

        $($("#datatable th")[allData.sortColumn]).attr('aria-sort',allData.sortDirectionFull);
        $($("#datatable th")[allData.sortColumn]).removeClass("sorting");
        $($("#datatable th")[allData.sortColumn]).addClass("sorting_"+allData.sortDirection);    
        $("#datatable img.calculator").click(function(){

                $("#event-details #calc-event-datetime").val($(this).attr('attrib-eventdate'));
                $("#event-details #calc-event-name").val($(this).attr('attrib-eventname'));

                $("#event-details .event-rating").html($(this).attr('attrib-rating'));
                $("#event-details .event-competition").html($(this).attr('attrib-competition'));
                $("#event-details .event-country").html($(this).attr('attrib-country'));

                $("#right-container input[name=back-odds]").val($(this).attr('attrib-back-odds'));    
                $("#right-container a.bookie-bet-url").attr("href",$(this).attr('attrib-bookie-bet-url'));    

                $("#right-container a.betfair-bet-url").attr("href",$(this).attr('attrib-betfair-bet-url'));    
                if ($(this).attr('attrib-sport') == "tennis" && $(this).attr('attrib-exchange') == "betfair")
                    $("#right-container a.betfair-bet-url").attr("href",$(this).attr('attrib-betfair-bet-url').replace("football","tennis"));    
                $("#right-container input[name=back-commission]").val("0.00");
                $("#right-container input[name=lay-odds]").val($(this).attr('attrib-lay-odds'));    
                $("#right-container input[name=lay-commission]").val("0.05");
                $("#odds-container .event-outcome").html($(this).attr('attrib-bet')+" To Win");
                $("#odds-container span.backTitle").html($(this).attr('attrib-back-odds'));
                $("#odds-container span.layTitle").html($(this).attr('attrib-lay-odds'));
                $("#match-container img.exchangeLogo").attr("src","/images/"+$(this).attr('attrib-exchange')+".png");
                $("#match-container img.sport-img").attr("src","/images/"+$(this).attr('attrib-sport')+".png");
                $("span.event-exchange").html(capitalizeFirstLetter($(this).attr('attrib-exchange')));
                $("span.event-back-oddsprovider").html($(this).attr('attrib-odds-provider'));            
                $("#odds-container span.back-odds").html($(this).attr('attrib-back-odds'));
                $("#odds-container span.lay-odds").html($(this).attr('attrib-lay-odds'));

                $("#odds-container span.backTitle").html($(this).attr('attrib-back-odds'));
                $("#odds-container span.backTitle").html($(this).attr('attrib-back-odds'));

                $("#odds-container img.bookmakerLogo").attr("src","/images/"+$(this).attr('attrib-odds-provider_id').toString()+".png");

                $("#odds-container .event-back-outcome").html($(this).attr('attrib-bet'));
                $("#odds-container .event-lay-outcome").html($(this).attr('attrib-bet'));
                $("#odds-container span.lay-availability").html("&#8364;"+$(this).attr('attrib-availability').toString()+" liquidità");

                var importo_puntata = $("#settings-form input[name=importo-puntata]").val();
                var importo_bonus_rimborso = $("#settings-form input[name=importo-bonus-rimborso]").val();
                $("#right-container input[name=back-stake]").val(importo_puntata);
                $("#right-container input[name=back-refund-stake]").val(importo_bonus_rimborso);

                $("#match-container").dialog({
                    width: 850,
                    modal: true,
                    resizable: false,
                    title: "Calcolatore - "+$(this).attr('attrib-eventname'),
                    open: function(event, ui) {

                        $("#rbtnSNR").attr("checked",false);
                        $("#rbtnSR").attr("checked",false);
                        $("#rbtnNormal").attr("checked",true);
                        updateCalculator(0);
                        $("html, body").animate({ scrollTop: 100 }, "fast");
                    }
                }).parent().position({
                    my: 'top+50px',
                    at: 'top',
                    collision: "flip flip",
                    of: $("#datatable_wrapper")
                });
            });
    }
    else
    {
        if(allData.bookmakers.length==1)
        {
            $("#bookmaker").css("display","none");
        }

        $("#datatable_nodata").css("display","block");
        $("#datatable_nodata h3").html("<center>Nessun dato trovato</center>");
        $("#datatable_wrapper").css("display","none");
    }

    $("#datatable_processing").css("display","none");
});}

I read that Jsoup is not able to extract dynamic values from an html page? 我读到Jsoup无法从html页面提取动态值? true? 真正?

True. 真正。 Data you're looking for is not in the page source. 您要查找的数据不在页面源中。 It's dynamically read as a result of POST to "/get_data.php". 由于POST到“ /get_data.php”而被动态读取。 Try to get that response as it will contain JSON object. 尝试获取该响应,因为它将包含JSON对象。 I'd recommend to use some JSON parsing library. 我建议使用一些JSON解析库。

Jsoup is not necessary here but it can be used to easily fetch JSON data: Jsoup在这里不是必需的,但可用于轻松获取JSON数据:

String jsonResponse = Jsoup
    .connect(url + "/get_data.php")
    .method(Connection.Method.POST)
    .header("Accept", "application/json")
    .timeout(20000)
    .ignoreContentType(true)
    .maxBodySize(0)
    .requestBody("\"refund\":importo_bonus_rimborso,\"back_stake\":importo_puntata,\"name\":eventname,\"filterbookies\":filterbookies,\"bookies\":bookies,\"rating-from\":ratingFrom,\"rating-to\":ratingTo,\"odds-from\":oddsFrom,\"odds-to\":oddsTo,\"min-liquidity\":availability,\"sort-column\":sortColumn,\"sort-direction\":sortDirection,\"offset\":offset,\"date-from\":dateFrom,\"date-to\":dateTo,\"exchange\":exchange,\"exchanges\":exchanges,'sport':sport}")
    .execute().body();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM