Scrapy搜寻器中的URL不会提供给下一个解析器

Question

I came across a yielding problem, when I was trying to crawl http://www.brand-in-trend.ru . 当我尝试爬网http://www.brand-in-trend.ru时，遇到一个良性问题。 As you see below, I'm using Scrapy and defined a Basespider. 如下所示，我正在使用Scrapy并定义了Basespider。 The first parser works perfectly fine and returns all brands found on the start_url. 第一个解析器可以正常工作，并返回在start_url上找到的所有品牌。

Now, when I want to yield the callback Request to the categories parser, I don't get a response nor an Error. 现在，当我想将回调Request产生给类别解析器时，我既没有响应也没有Error。 The spider just quits. 蜘蛛刚刚退出。

Spider: 蜘蛛：

class brandintrend(BaseSpider):
name = "brandintrend"

allowed_domains = [ 'trend-in-brand.ru' ]

start_urls      = [ 'http://brand-in-trend.ru/brands/' ]

def parse(self, response):
    hxs         = HtmlXPathSelector(response)
    brands      = hxs.select('//div[@class="brandcol"]/ul/li/a/@href').extract()

    for brand in brands:
        brand = "http://www.brand-in-trend.ru" + brand
        print brand
        # request = Request(brand, callback=self.categories)
        yield Request(brand, callback=self.categories)

def categories(self, response):
    print "Hello World"
    hxs = HtmlXPathSelector(response)
    print response.url

I tried the following already to solve this issue: 我已经尝试了以下方法来解决此问题：

I tested the generated brand urls (ex. http://www.brand-in-trend.ru/brands/parker/ ) in Chrome (Javasript turned off) and they worked fine. 我在Chrome（Javasript已关闭）中测试了生成的品牌网址（例如http://www.brand-in-trend.ru/brands/parker/ ），它们工作正常。
I put all generated brand urls in the start_url list and tried to yield those directly to the categories parser 我将所有生成的品牌url放入start_url列表中，并尝试将其直接提供给类别解析器
I looked at this post, which unfortunately didn't solve my problem: scrapy unable to make Request() callback 我看了这篇文章，不幸的是没有解决我的问题： scrapy无法进行Request（）回调

If anybody came across a similar problem I would be greatful for a solution or advise 如果有人遇到类似问题，我将为您提供解决方案或建议

Thanks in advance 提前致谢

J Ĵ

Answer 1

This is because you set: 这是因为您设置了：

allowed_domains = [ 'trend-in-brand.ru' ]

but, you are crawling the url from a different domain: 但是，您正在从其他域抓取该网址：

start_urls = [ 'http://brand-in-trend.ru/brands/' ]

See trend-in-brand vs brand-in-trend . 参见trend-in-brand与brand-in-trend 。

Scrapy搜寻器中的URL不会提供给下一个解析器

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-09-18 10:34:29

Scrapy搜寻器中的URL不会提供给下一个解析器

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-09-18 10:34:29

解决方案1
1 已采纳 2013-09-18 10:34:29