Scrapy extract data from dynamic table

Question

I am trying to pull all the TD values from the table="table-main" from the website: http://www.oddsportal.com/basketball/usa/nba/results/

I am using Scrapy and Python 2.7

From Scrapy Shell I can get the table via:

response.xpath('//*[@id="tournamentTable"]')

But I cannot seem to get any of the TR or TD of that table.

response.xpath('//*[@id="tournamentTable"]/tbody')

and response.xpath('//*[@id="tournamentTable"]/tbody/tr')

returns an empty list. I suspect that perhaps the table is created dynamically. Can anyone please help me with scraping all the team names, scores, and odds from that table. I have been stuck on this for a while.

This question is different to what people recommend is a duplicate here: Scrapy not finding table because that question is about getting the table. This question is about getting the data in the table.

Answer 1

Yes, the results are loaded with an additional call to the website API. In this case the request is made to http://fb.oddsportal.com/ajax-sport-country-tournament-archive/3/MmbLsWh8/X0/1/-1/1/?_=1446338252826 .

I'm not sure you can hardcode the URL in your spider since, at least, there are these 3 and MmbLsWh8 parts of the URL that are actually coming from a script tag on the main page:

<script type="text/javascript">
    //<![CDATA[
    var op = new OpHandler();if(!page)var page = new PageTournament({"id":"MmbLsWh8","sid":3,"cid":200,"archive":true});var menu_open = null;vJs();op.init();if(page && page.display)page.display();    var sigEndPage = true;
    try
    {
        if (sigEndJs)
        {
            globals.onPageReady();
        }
    } catch (e)
    {
    }

    //]]>
</script>

Plus, there is a _ parameter, that looks like a timestamp.

The call to this AJAX url would return you a JSONP response with an HTML code of the NBA results inside. You need to extract the HTML code from the response (with a regular expressions, for instance), feed it to a Selector and extract the results. Some sample code from the shell to get you started:

$ scrapy shell http://www.oddsportal.com/basketball/usa/nba/results/
In [1]: fetch("http://fb.oddsportal.com/ajax-sport-country-tournament-archive/3/MmbLsWh8/X0/1/-1/1/?_=1446338252826")
In [2]: import re
In [3]: pattern = re.compile(r'"html":"(.*?)"}', re.MULTILINE | re.DOTALL)
In [4]: import scrapy
In [5]: selector = scrapy.Selector(text=pattern.search(response.body).group(1))
In [6]: # TODO: now use the selector to extract the desired data

Scrapy extract data from dynamic table

Question

1 answers

solution1
0 ACCPTED 2015-11-01 00:59:10

Scrapy extract data from dynamic table

Question

1 answers

solution1 0 ACCPTED 2015-11-01 00:59:10

solution1
0 ACCPTED 2015-11-01 00:59:10