项目管道无法正常运行

Question

I have written the following code and I found that item pipeline will not work if I write in the following way, the process_item (in item pipeline) will not be executed. 我已经编写了以下代码，但我发现如果按照以下方式编写，则项目管道将无法工作，（项目管道中的） process_item将不会执行。

class Spider(scrapy.Spider):
    name = “***”
    def __init__(self, url='http://example.com/', **kw):
        super(Spider,self).__init__(**kw)
        self.url = url 
        self.allowed_domains = [re.sub(r'^www\.', '', urlparse(url).hostname)]

    def start_requests(self):
        #return [Request(self.url, callback=self.parse, dont_filter=False)]
        return [Request(self.url, callback=self.find_all_url, dont_filter=False)]

    def find_all_url(self,response):
        log.msg('current url: '+response.url, level=log.DEBUG)
        if True:
              self.parse(response)

    def parse(self, response):
        dept = deptItem()
        dept['deptName'] = response.xpath('//title/text()').extract()[0].strip()
        dept['url'] = response.url
        log.msg('find an item: '+ str(response.url) +'\n going to return item' , level = log.INFO)
        return dept

However, if I change the callback in start_requests from self.find_all_url to self.parse (see above the commented code), the item pipeline works, I try to find out why, but I couldn't, anyone can help? 但是，如果我将start_requests中的回调从self.find_all_url为self.parse （请参见上面的注释代码），则该项目管道有效，我试图找出原因，但我不能，任何人都可以帮助您？

Answer 1

I have found out that if I want to write in this way, I need to add return in front of self.parse(response) in function find_all_url . 我发现，如果我想用这种方式来写，我需要补充return面前self.parse(response)函数find_all_url 。

But I am not very clear why this is the case, I guess the returned item should eventually return to the initial requests? 但是我不是很清楚为什么会这样，我想退回的物品最终应该返回到最初的请求吗？

Answer 2

Can you post your settings? 您可以发布设置吗？

You must define the pipelines in settings.py 您必须在settings.py中定义管道

ITEM_PIPELINES = {
   'MySpider.pipelines.SomePipeline': 300,
}

Basic example: https://github.com/scrapy/dirbot 基本示例： https ： //github.com/scrapy/dirbot

项目管道无法正常运行

问题描述

2 个解决方案

解决方案1
0 2015-05-19 19:47:21

解决方案2
0 2015-11-25 00:01:40

项目管道无法正常运行

问题描述

2 个解决方案

解决方案1 0 2015-05-19 19:47:21

解决方案2 0 2015-11-25 00:01:40

解决方案1
0 2015-05-19 19:47:21

解决方案2
0 2015-11-25 00:01:40