[英]XHR Requests not returning all data from website
我在Windows 8 64位上使用Python.org版本2.7 64位。 我有一些代码可以迭代一系列日期变量,以创建XHR向网站提交的内容。 这些尝试将足球数据提取到迭代中进行的比赛中。 如果今天没有比赛进行,将显示一条消息以表示这种效果。
我拥有的代码工作正常,但除了最近的季节外,它不返回任何数据。 我要抓取的页面在这里:
http://www.whoscored.com/Regions/252/Tournaments/26
日历允许您在日期之间切换,XHR请求将在页面上填充此数据。 我用于执行此操作的代码是:
from datetime import date, timedelta as td
from ast import literal_eval
from datetime import datetime
import requests
import time
import re
list1 = [2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013]
list2 = [2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014]
for x, y in zip(list1, list2):
print "list1 - " + str(x)
print "list2 - " + str(y)
d1 = date(x,11,01)
d2 = date(y,5,31)
delta = d2 - d1
for i in range(delta.days + 1):
time1 = str(d1 + td(days=i))
time2 = time1.split("-", 1)[0]
time3 = time1.split("-", -1)[1]
time4 = time1.rsplit("-", 1)[-1]
time2 = int(time2)
time3 = int(time3)
time4 = int(time4)
date1 = datetime(year=time2, month=time3, day=time4)
url = 'http://www.whoscored.com/tournamentsfeed/8273/Fixtures/'
params = {'d': date1.strftime('%Y%m%d'), 'isAggregate': 'false'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
response = requests.get(url, params=params, headers=headers)
try:
fixtures = literal_eval(response.content)
if fixtures is not None and len(fixtures) > 0: # If there are fixtures
print ",\n".join([", ".join(str(x) for x in fixture) for fixture in fixtures]) # `fixtures` is a nested list
time.sleep(0.5)
else:
print "No Fixtures Today: " + date1.isoformat()
time.sleep(0.5)
except SyntaxError:
print "Error!!!"
time.sleep(0.5)
据我了解,所有可用季节的所有数据都应通过相同的方法,从同一位置访问。 谁能看到为什么这行不通?
谢谢
问题是每个赛季的锦标赛ID都不相同,这意味着URL有所不同。 我更改了代码以使其适用于所有年份及其比赛ID
import json
import requests
import time
from datetime import date, timedelta
year_tournament_map = {
2013: 8273,
2012: 6978,
2011: 5861,
2010: 4940,
2009: 3419,
2008: 2689,
2007: 2175,
2006: 1645,
2005: 1291,
2004: 903,
2003: 579,
2002: 421,
2001: 243,
2000: 114,
1999: 26,
}
years = sorted(year_tournament_map.keys())
url = 'http://www.whoscored.com/tournamentsfeed/%s/Fixtures/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
for year in years:
start_date = date(year, 11, 1)
end_date = date(year + 1, 5, 31)
delta = end_date - start_date
for days in range(delta.days + 1):
time.sleep(0.5)
test_date = start_date + timedelta(days=days)
params = {'d': str(test_date).replace('-', ''), 'isAggregate': 'false'}
response = requests.get(url % year_tournament_map[year], params=params, headers=headers)
try:
json_data = response.content.replace("'", '"').replace(',,', ',null,')
fixtures = json.loads(json_data)
except ValueError:
print "Error!!!"
else:
if fixtures: # If there are fixtures
print ",\n".join([", ".join(str(x) for x in fixture) for fixture in fixtures]) # `fixtures` is a nested list
else:
print "No Fixtures Today: %s" % test_date
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.