简体   繁体   English

Python Scrapy:Yahoo Fantasy蜘蛛不返回任何内容,没有错误

[英]Python Scrapy: Yahoo Fantasy spider returning nothing, no errors

I'm working on a project to scrape statistics from Fantasy Football leagues across various services, and Yahoo is the one I'm stuck at currently. 我正在做一个项目,以从幻想足球联赛的各种服务中收集统计数据,而雅虎是我目前坚持的一个项目。 I want my spider to crawl the Draft Results page of a public Yahoo league. 我希望我的蜘蛛检索Yahoo公开联盟的“草稿结果”页面。 When I run the spider, it gives me no results, and no error message either. 当我运行蜘蛛程序时,它没有任何结果,也没有错误消息。 It simply says: 它只是说:

2012-09-14 17:29:08-0700 [draft] DEBUG: Crawled (200) <GET http://football.fantasysports.yahoo.com/f1/753697/draftresults?drafttab=round> (referer: None)
2012-09-14 17:29:08-0700 [draft] INFO: Closing spider (finished)
2012-09-14 17:29:08-0700 [draft] INFO: Dumping spider stats:
    {'downloader/request_bytes': 250,
     'downloader/request_count': 1,
     'downloader/request_method_count/GET': 1,
     'downloader/response_bytes': 48785,
     'downloader/response_count': 1,
     'downloader/response_status_count/200': 1,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2012, 9, 15, 0, 29, 8, 734000),
     'scheduler/memory_enqueued': 1,
     'start_time': datetime.datetime(2012, 9, 15, 0, 29, 7, 718000)}
2012-09-14 17:29:08-0700 [draft] INFO: Spider closed (finished)
2012-09-14 17:29:08-0700 [scrapy] INFO: Dumping global stats:
    {}

It's not a login issue, because the page in question is accessible without being signed in. I see from other questions posted here that people have gotten scrapes to work for other parts of Yahoo. 这不是登录问题,因为有问题的页面无需登录即可访问。从这里发布的其他问题中,我看到人们已经为Yahoo的其他部分工作了。 Is it possible that Yahoo Fantasy is blocking spiders? Yahoo Fantasy是否有可能阻止蜘蛛? I've successfully written one for ESPN already, so I don't think the issue is with my code. 我已经为ESPN成功编写了一个,所以我认为问题不出在我的代码上。 Here it is anyway: 无论如何,这里:

class DraftSpider(CrawlSpider):
name = "draft"
#psycopg stuff here

rows = ["753697"]

allowed_domains = ["football.fantasysports.yahoo.com"]

start_urls = []

for row in rows:

    start_urls.append("http://football.fantasysports.yahoo.com/f1/" + "%s" % (row) + "/draftresults?drafttab=round")

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select("/html/body/div/div/div/div/div/div/div/table/tr")
        items = []
        for site in sites:
            item = DraftItem()
            item['pick_number'] = site.select("td[@class='first']/text()").extract()
            item['pick_player'] = site.select("td[@class='player']/a/text()").extract()
            item['pick_nflteam'] = site.select("td[@class='player']/span/text()").extract()
            item['pick_ffteam'] = site.select("td[@class='last']/@title").extract()
            items.append(item)
        return items

Would really appreciate any insight on this. 非常感谢您对此有任何见识。

C:\Users\Akhter Wahab>scrapy shell http://football.fantasysports.yahoo.com/f1/75
In [1]: hxs.select("/html/body/div/div/div/div/div/div/div/table/tr")
Out[1]: []

your absolute Xpath is not right "/html/body/div/div/div/div/div/div/div/table/tr" 您的绝对Xpath不正确“ / html / body / div / div / div / div / div / div / div / div / div / table / tr”

as well as i will never recommend you to use absolute Xpath , but you should use some relative xpath like all results are in 以及我永远不会建议您使用绝对Xpath,但是您应该使用一些相对xpath,如所有结果都在

//div[@id='drafttables']

this div. 这个div。 so you can start getting results. 这样您就可以开始获得结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM