简体   繁体   English

项目管道无法正常运行

[英]item pipeline not working in scrapy

I have written the following code and I found that item pipeline will not work if I write in the following way, the process_item (in item pipeline) will not be executed. 我已经编写了以下代码,但我发现如果按照以下方式编写,则项目管道将无法工作,(项目管道中的) process_item将不会执行。

class Spider(scrapy.Spider):
    name = “***”
    def __init__(self, url='http://example.com/', **kw):
        super(Spider,self).__init__(**kw)
        self.url = url 
        self.allowed_domains = [re.sub(r'^www\.', '', urlparse(url).hostname)]

    def start_requests(self):
        #return [Request(self.url, callback=self.parse, dont_filter=False)]
        return [Request(self.url, callback=self.find_all_url, dont_filter=False)]

    def find_all_url(self,response):
        log.msg('current url: '+response.url, level=log.DEBUG)
        if True:
              self.parse(response)

    def parse(self, response):
        dept = deptItem()
        dept['deptName'] = response.xpath('//title/text()').extract()[0].strip()
        dept['url'] = response.url
        log.msg('find an item: '+ str(response.url) +'\n going to return item' , level = log.INFO)
        return dept        

However, if I change the callback in start_requests from self.find_all_url to self.parse (see above the commented code), the item pipeline works, I try to find out why, but I couldn't, anyone can help? 但是,如果我将start_requests中的回调从self.find_all_urlself.parse (请参见上面的注释代码),则该项目管道有效,我试图找出原因,但我不能,任何人都可以帮助您?

I have found out that if I want to write in this way, I need to add return in front of self.parse(response) in function find_all_url . 我发现,如果我想用这种方式来写,我需要补充return面前self.parse(response)函数find_all_url

But I am not very clear why this is the case, I guess the returned item should eventually return to the initial requests? 但是我不是很清楚为什么会这样,我想退回的物品最终应该返回到最初的请求吗?

Can you post your settings? 您可以发布设置吗?

You must define the pipelines in settings.py 您必须在settings.py中定义管道

ITEM_PIPELINES = {
   'MySpider.pipelines.SomePipeline': 300,
}

Basic example: https://github.com/scrapy/dirbot 基本示例: https//github.com/scrapy/dirbot

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM