I am using Python.org version 2.7 64 bit on Windows 8 64 bit. I have some code that iterates through a series of date variables to create XHR submissions to a website. These attempt to pull down football data for matches played on the days iterated through. If no matches were played that today a message is printed to this effect.
The code I have works fine, except for it is not returning any data for anything but the most recent season. The page I am trying to scrape is here:
http://www.whoscored.com/Regions/252/Tournaments/26
The calendar allows you to toggle between dates and XHR requests populate this data on the page. The code I am using to do this is:
from datetime import date, timedelta as td
from ast import literal_eval
from datetime import datetime
import requests
import time
import re
list1 = [2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013]
list2 = [2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014]
for x, y in zip(list1, list2):
print "list1 - " + str(x)
print "list2 - " + str(y)
d1 = date(x,11,01)
d2 = date(y,5,31)
delta = d2 - d1
for i in range(delta.days + 1):
time1 = str(d1 + td(days=i))
time2 = time1.split("-", 1)[0]
time3 = time1.split("-", -1)[1]
time4 = time1.rsplit("-", 1)[-1]
time2 = int(time2)
time3 = int(time3)
time4 = int(time4)
date1 = datetime(year=time2, month=time3, day=time4)
url = 'http://www.whoscored.com/tournamentsfeed/8273/Fixtures/'
params = {'d': date1.strftime('%Y%m%d'), 'isAggregate': 'false'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
response = requests.get(url, params=params, headers=headers)
try:
fixtures = literal_eval(response.content)
if fixtures is not None and len(fixtures) > 0: # If there are fixtures
print ",\n".join([", ".join(str(x) for x in fixture) for fixture in fixtures]) # `fixtures` is a nested list
time.sleep(0.5)
else:
print "No Fixtures Today: " + date1.isoformat()
time.sleep(0.5)
except SyntaxError:
print "Error!!!"
time.sleep(0.5)
As far as I understand it, all the data for all available seasons should all be accessed via the same method and from the same place. Can anyone see why this is not working?
Thanks
The problem is that each season is with different tournament ID wich means that the URL is different. I changed the code to work with all years and their tournament IDs
import json
import requests
import time
from datetime import date, timedelta
year_tournament_map = {
2013: 8273,
2012: 6978,
2011: 5861,
2010: 4940,
2009: 3419,
2008: 2689,
2007: 2175,
2006: 1645,
2005: 1291,
2004: 903,
2003: 579,
2002: 421,
2001: 243,
2000: 114,
1999: 26,
}
years = sorted(year_tournament_map.keys())
url = 'http://www.whoscored.com/tournamentsfeed/%s/Fixtures/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
for year in years:
start_date = date(year, 11, 1)
end_date = date(year + 1, 5, 31)
delta = end_date - start_date
for days in range(delta.days + 1):
time.sleep(0.5)
test_date = start_date + timedelta(days=days)
params = {'d': str(test_date).replace('-', ''), 'isAggregate': 'false'}
response = requests.get(url % year_tournament_map[year], params=params, headers=headers)
try:
json_data = response.content.replace("'", '"').replace(',,', ',null,')
fixtures = json.loads(json_data)
except ValueError:
print "Error!!!"
else:
if fixtures: # If there are fixtures
print ",\n".join([", ".join(str(x) for x in fixture) for fixture in fixtures]) # `fixtures` is a nested list
else:
print "No Fixtures Today: %s" % test_date
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.