[英]URLs in Scrapy crawler are not yielded to the next parser
I came across a yielding problem, when I was trying to crawl http://www.brand-in-trend.ru . 当我尝试爬网http://www.brand-in-trend.ru时,遇到一个良性问题。 As you see below, I'm using Scrapy and defined a Basespider.
如下所示,我正在使用Scrapy并定义了Basespider。 The first parser works perfectly fine and returns all brands found on the start_url.
第一个解析器可以正常工作,并返回在start_url上找到的所有品牌。
Now, when I want to yield the callback Request to the categories parser, I don't get a response nor an Error. 现在,当我想将回调Request产生给类别解析器时,我既没有响应也没有Error。 The spider just quits.
蜘蛛刚刚退出。
Spider: 蜘蛛:
class brandintrend(BaseSpider):
name = "brandintrend"
allowed_domains = [ 'trend-in-brand.ru' ]
start_urls = [ 'http://brand-in-trend.ru/brands/' ]
def parse(self, response):
hxs = HtmlXPathSelector(response)
brands = hxs.select('//div[@class="brandcol"]/ul/li/a/@href').extract()
for brand in brands:
brand = "http://www.brand-in-trend.ru" + brand
print brand
# request = Request(brand, callback=self.categories)
yield Request(brand, callback=self.categories)
def categories(self, response):
print "Hello World"
hxs = HtmlXPathSelector(response)
print response.url
I tried the following already to solve this issue: 我已经尝试了以下方法来解决此问题:
If anybody came across a similar problem I would be greatful for a solution or advise 如果有人遇到类似问题,我将为您提供解决方案或建议
Thanks in advance 提前致谢
J Ĵ
This is because you set: 这是因为您设置了:
allowed_domains = [ 'trend-in-brand.ru' ]
but, you are crawling the url from a different domain: 但是,您正在从其他域抓取该网址:
start_urls = [ 'http://brand-in-trend.ru/brands/' ]
See trend-in-brand
vs brand-in-trend
. 参见
trend-in-brand
与brand-in-trend
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.