简体   繁体   中英

Scrapy callback function in another file

I am using Scrapy with Python to scrape several websites.

I got many Spiders with a structure like this:

import library as lib

class Spider(Spider):
   ...

   def parse(self, response):
       yield FormRequest(..., callback=lib.parse_after_filtering_results1)
       yield FormRequest(..., callback=lib.parse_after_filtering_results2)

   def parse_after_filtering_results1(self,response):
       return results

   def parse_after_filtering_results2(self,response):
       ... (doesn't return anything)

I would like to know if there's any way I can put the last 2 functions, which are called in the callback, in another module that is common to all my Spiders (so that if I modify it then all of them change). I know they are class functions but is there anyway I could put them in another file?

I have tried declaring the functions in my library.py file but my problem is how can I pass the 2 parameters needed (self, response) to them.

Create a base class to contain those common functions. Then your real spiders can inherit from that. For example, if all your spiders extend Spider then you can do the following:

spiders/basespider.py:

from scrapy import Spider

class BaseSpider(Spider):
    # Do not give it a name so that it does not show up in the spiders list.
    # This contains only common functions.

    def parse_after_filtering_results1(self, response):
        # ...

    def parse_after_filtering_results2(self, response):
        # ...

spiders/realspider.py:

from .basespider import BaseSpider

class RealSpider(BaseSpider):
     # ...

    def parse(self, response):
        yield FormRequest(..., callback=self.parse_after_filtering_results1)
        yield FormRequest(..., callback=self.parse_after_filtering_results2)

If you have different types of spiders you can create different base classes. Or your base class can be a plain object (not Spider ) and then you can use it as a mixin.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM