Python Scrapy：Yahoo Fantasy蜘蛛不返回任何内容，没有错误

Question

I'm working on a project to scrape statistics from Fantasy Football leagues across various services, and Yahoo is the one I'm stuck at currently. 我正在做一个项目，以从幻想足球联赛的各种服务中收集统计数据，而雅虎是我目前坚持的一个项目。 I want my spider to crawl the Draft Results page of a public Yahoo league. 我希望我的蜘蛛检索Yahoo公开联盟的“草稿结果”页面。 When I run the spider, it gives me no results, and no error message either. 当我运行蜘蛛程序时，它没有任何结果，也没有错误消息。 It simply says: 它只是说：

2012-09-14 17:29:08-0700 [draft] DEBUG: Crawled (200) <GET http://football.fantasysports.yahoo.com/f1/753697/draftresults?drafttab=round> (referer: None)
2012-09-14 17:29:08-0700 [draft] INFO: Closing spider (finished)
2012-09-14 17:29:08-0700 [draft] INFO: Dumping spider stats:
    {'downloader/request_bytes': 250,
     'downloader/request_count': 1,
     'downloader/request_method_count/GET': 1,
     'downloader/response_bytes': 48785,
     'downloader/response_count': 1,
     'downloader/response_status_count/200': 1,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2012, 9, 15, 0, 29, 8, 734000),
     'scheduler/memory_enqueued': 1,
     'start_time': datetime.datetime(2012, 9, 15, 0, 29, 7, 718000)}
2012-09-14 17:29:08-0700 [draft] INFO: Spider closed (finished)
2012-09-14 17:29:08-0700 [scrapy] INFO: Dumping global stats:
    {}

It's not a login issue, because the page in question is accessible without being signed in. I see from other questions posted here that people have gotten scrapes to work for other parts of Yahoo. 这不是登录问题，因为有问题的页面无需登录即可访问。从这里发布的其他问题中，我看到人们已经为Yahoo的其他部分工作了。 Is it possible that Yahoo Fantasy is blocking spiders? Yahoo Fantasy是否有可能阻止蜘蛛？ I've successfully written one for ESPN already, so I don't think the issue is with my code. 我已经为ESPN成功编写了一个，所以我认为问题不出在我的代码上。 Here it is anyway: 无论如何，这里：

class DraftSpider(CrawlSpider):
name = "draft"
#psycopg stuff here

rows = ["753697"]

allowed_domains = ["football.fantasysports.yahoo.com"]

start_urls = []

for row in rows:

    start_urls.append("http://football.fantasysports.yahoo.com/f1/" + "%s" % (row) + "/draftresults?drafttab=round")

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select("/html/body/div/div/div/div/div/div/div/table/tr")
        items = []
        for site in sites:
            item = DraftItem()
            item['pick_number'] = site.select("td[@class='first']/text()").extract()
            item['pick_player'] = site.select("td[@class='player']/a/text()").extract()
            item['pick_nflteam'] = site.select("td[@class='player']/span/text()").extract()
            item['pick_ffteam'] = site.select("td[@class='last']/@title").extract()
            items.append(item)
        return items

Would really appreciate any insight on this. 非常感谢您对此有任何见识。

Answer 1

C:\Users\Akhter Wahab>scrapy shell http://football.fantasysports.yahoo.com/f1/75
In [1]: hxs.select("/html/body/div/div/div/div/div/div/div/table/tr")
Out[1]: []

your absolute Xpath is not right "/html/body/div/div/div/div/div/div/div/table/tr" 您的绝对Xpath不正确“ / html / body / div / div / div / div / div / div / div / div / div / table / tr”

as well as i will never recommend you to use absolute Xpath , but you should use some relative xpath like all results are in 以及我永远不会建议您使用绝对Xpath，但是您应该使用一些相对xpath，如所有结果都在

//div[@id='drafttables']

this div. 这个div。 so you can start getting results. 这样您就可以开始获得结果。

Python Scrapy：Yahoo Fantasy蜘蛛不返回任何内容，没有错误

问题描述

1 个解决方案

解决方案1
1 已采纳 2012-09-17 08:07:25

Python Scrapy：Yahoo Fantasy蜘蛛不返回任何内容，没有错误

问题描述

1 个解决方案

解决方案1 1 已采纳 2012-09-17 08:07:25

解决方案1
1 已采纳 2012-09-17 08:07:25