Scrape multiple table pages with BeautifulSoup and Python

Question

http://www.indymini.com/p/mini-marathon/miniresults

I want to scrap table available on this url with python BS4 but when i change the table size or change page, url does not chang.

Answer 1

When navigating through the table, the URL does not change because the table seems to be implemented using javascript (DataTables library in particular) - and uses AJAX to get relevant data to display.

So, basically, I don't see a way you could scrape the page using BS4 and get data other than those displayed by default, when the page loads.

On the other hand, as the data is retrieved using AJAX, you could try to figure out the format of the AJAX request (what parameter does what with respect to the results you want, for example using Firebug) and retrieve the data directly in JSON format by calling the AJAX URL that supplies the data table.

But, depending on your intended use of the data, you might want to consider asking the owner of the website for permission to download and use the data. And, who knows - they might be willing to help.

Answer 2

Well its a ajax call that is sent to server via GET, here is quick and dirty scrapping code in python

ajax url is

import requests,time
c=0
data=list()
for i in range(1,2278):
    url='http://results.xacte.com/json/search?eventId=1387&callback=jQuery18309972286304579958_1494520029659&sEcho=8&iColumns=13&sColumns=&iDisplayStart='+str(c)+'&iDisplayLength=10&mDataProp_0=&mDataProp_1=bib&mDataProp_2=firstname&mDataProp_3=lastname&mDataProp_4=sex&mDataProp_5=age&mDataProp_6=city&mDataProp_7=state&mDataProp_8=country&mDataProp_9=&mDataProp_10=&mDataProp_11=&mDataProp_12=&sSearch=&bRegex=false&sSearch_0=&bRegex_0=false&bSearchable_0=false&sSearch_1=&bRegex_1=false&bSearchable_1=true&sSearch_2=&bRegex_2=false&bSearchable_2=true&sSearch_3=&bRegex_3=false&bSearchable_3=true&sSearch_4=&bRegex_4=false&bSearchable_4=true&sSearch_5=&bRegex_5=false&bSearchable_5=true&sSearch_6=&bRegex_6=false&bSearchable_6=true&sSearch_7=&bRegex_7=false&bSearchable_7=true&sSearch_8=&bRegex_8=false&bSearchable_8=true&sSearch_9=&bRegex_9=false&bSearchable_9=true&sSearch_10=&bRegex_10=false&bSearchable_10=true&sSearch_11=&bRegex_11=false&bSearchable_11=false&sSearch_12=&bRegex_12=false&bSearchable_12=false&iSortCol_0=0&sSortDir_0=asc&iSortingCols=1&bSortable_0=false&bSortable_1=true&bSortable_2=true&bSortable_3=true&bSortable_4=true&bSortable_5=true&bSortable_6=true&bSortable_7=true&bSortable_8=true&bSortable_9=false&bSortable_10=false&bSortable_11=false&bSortable_12=false&_='+str(time.time())
    r=requests.get(url)
    c+=1
    print (r.text,'-------------',)
    #do whatever you want to do with it, r.text will give the raw data.

Scrape multiple table pages with BeautifulSoup and Python

Question

2 answers

solution1
0 ACCPTED 2017-05-11 16:21:42

solution2
0 2017-05-11 16:52:47

Scrape multiple table pages with BeautifulSoup and Python

Question

2 answers

solution1 0 ACCPTED 2017-05-11 16:21:42

solution2 0 2017-05-11 16:52:47

solution1
0 ACCPTED 2017-05-11 16:21:42

solution2
0 2017-05-11 16:52:47