简体   繁体   English

刮痧。 如何将项目发送到管道中的 close_spider 方法

[英]Scrapy. How to send item to close_spider method in pipeline

I yield and process many items and in some cases, I update the tracking sheet.我生产并处理了许多项目,在某些情况下,我更新了跟踪表。 This tracking contains several attributes, including country, all these attributes come from the item.这个跟踪包含几个属性,包括国家,所有这些属性都来自该项目。 All these operations are going in the pipeline.所有这些操作都在进行中。 After the spider is closed I have to send this tracking to responsible people by country.蜘蛛关闭后,我必须按国家/地区将此跟踪发送给负责人。 But I can't send the item to the method where I catch a closing spider但我无法将项目发送到我捕获关闭蜘蛛的方法

To catch this moment I use this:为了抓住这一刻,我使用了这个:

@classmethod
def from_crawler(cls, crawler):
    temp = cls()
    crawler.signals.connect(temp.customize_close_spider, signal=signals.spider_closed)
    return temp

def customize_close_spider(self, **kwargs):
    reason = kwargs.get("reason") 
    spider = kwargs.get("spider")
    if reason == "finished":
        #some action

I can send the item neither to from_crawler nor customize_close_spider.我不能将项目发送到 from_crawler 和 customize_close_spider。 I need it in order to get the country attribute from the item.我需要它才能从项目中获取国家属性。

Maybe there is another way to send a signal, for example, to another method that I can call from the tracking method也许还有另一种发送信号的方法,例如,我可以从跟踪方法调用的另一种方法

The spider_closed method is only executed once, at the end of the scraping. spider_closed 方法仅在抓取结束时执行一次。 If you need to execute something for every item, you can use the process_item method (which is executed for every item).如果您需要为每个项目执行某些操作,您可以使用 process_item 方法(对每个项目执行)。

In case you need to wait until all items have been scraped, you can write all items to a file ( doc ), and read from this file in spider_closed.如果您需要等到所有项目都被抓取,您可以将所有项目写入文件( doc ),并在 spider_close 中从此文件中读取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM