[英]Python Scrapy: Yahoo Fantasy spider returning nothing, no errors
I'm working on a project to scrape statistics from Fantasy Football leagues across various services, and Yahoo is the one I'm stuck at currently. 我正在做一个项目,以从幻想足球联赛的各种服务中收集统计数据,而雅虎是我目前坚持的一个项目。 I want my spider to crawl the Draft Results page of a public Yahoo league.
我希望我的蜘蛛检索Yahoo公开联盟的“草稿结果”页面。 When I run the spider, it gives me no results, and no error message either.
当我运行蜘蛛程序时,它没有任何结果,也没有错误消息。 It simply says:
它只是说:
2012-09-14 17:29:08-0700 [draft] DEBUG: Crawled (200) <GET http://football.fantasysports.yahoo.com/f1/753697/draftresults?drafttab=round> (referer: None)
2012-09-14 17:29:08-0700 [draft] INFO: Closing spider (finished)
2012-09-14 17:29:08-0700 [draft] INFO: Dumping spider stats:
{'downloader/request_bytes': 250,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 48785,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2012, 9, 15, 0, 29, 8, 734000),
'scheduler/memory_enqueued': 1,
'start_time': datetime.datetime(2012, 9, 15, 0, 29, 7, 718000)}
2012-09-14 17:29:08-0700 [draft] INFO: Spider closed (finished)
2012-09-14 17:29:08-0700 [scrapy] INFO: Dumping global stats:
{}
It's not a login issue, because the page in question is accessible without being signed in. I see from other questions posted here that people have gotten scrapes to work for other parts of Yahoo. 这不是登录问题,因为有问题的页面无需登录即可访问。从这里发布的其他问题中,我看到人们已经为Yahoo的其他部分工作了。 Is it possible that Yahoo Fantasy is blocking spiders?
Yahoo Fantasy是否有可能阻止蜘蛛? I've successfully written one for ESPN already, so I don't think the issue is with my code.
我已经为ESPN成功编写了一个,所以我认为问题不出在我的代码上。 Here it is anyway:
无论如何,这里:
class DraftSpider(CrawlSpider):
name = "draft"
#psycopg stuff here
rows = ["753697"]
allowed_domains = ["football.fantasysports.yahoo.com"]
start_urls = []
for row in rows:
start_urls.append("http://football.fantasysports.yahoo.com/f1/" + "%s" % (row) + "/draftresults?drafttab=round")
def parse(self, response):
hxs = HtmlXPathSelector(response)
sites = hxs.select("/html/body/div/div/div/div/div/div/div/table/tr")
items = []
for site in sites:
item = DraftItem()
item['pick_number'] = site.select("td[@class='first']/text()").extract()
item['pick_player'] = site.select("td[@class='player']/a/text()").extract()
item['pick_nflteam'] = site.select("td[@class='player']/span/text()").extract()
item['pick_ffteam'] = site.select("td[@class='last']/@title").extract()
items.append(item)
return items
Would really appreciate any insight on this. 非常感谢您对此有任何见识。
C:\Users\Akhter Wahab>scrapy shell http://football.fantasysports.yahoo.com/f1/75
In [1]: hxs.select("/html/body/div/div/div/div/div/div/div/table/tr")
Out[1]: []
your absolute Xpath is not right "/html/body/div/div/div/div/div/div/div/table/tr" 您的绝对Xpath不正确“ / html / body / div / div / div / div / div / div / div / div / div / table / tr”
as well as i will never recommend you to use absolute Xpath , but you should use some relative xpath like all results are in 以及我永远不会建议您使用绝对Xpath,但是您应该使用一些相对xpath,如所有结果都在
//div[@id='drafttables']
this div. 这个div。 so you can start getting results.
这样您就可以开始获得结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.