I am using Scrapy with Python to scrape several websites.
I got many Spiders with a structure like this:
import library as lib
class Spider(Spider):
...
def parse(self, response):
yield FormRequest(..., callback=lib.parse_after_filtering_results1)
yield FormRequest(..., callback=lib.parse_after_filtering_results2)
def parse_after_filtering_results1(self,response):
return results
def parse_after_filtering_results2(self,response):
... (doesn't return anything)
I would like to know if there's any way I can put the last 2 functions, which are called in the callback, in another module that is common to all my Spiders (so that if I modify it then all of them change). I know they are class functions but is there anyway I could put them in another file?
I have tried declaring the functions in my library.py file but my problem is how can I pass the 2 parameters needed (self, response) to them.
Create a base class to contain those common functions. Then your real spiders can inherit from that. For example, if all your spiders extend Spider
then you can do the following:
spiders/basespider.py:
from scrapy import Spider
class BaseSpider(Spider):
# Do not give it a name so that it does not show up in the spiders list.
# This contains only common functions.
def parse_after_filtering_results1(self, response):
# ...
def parse_after_filtering_results2(self, response):
# ...
spiders/realspider.py:
from .basespider import BaseSpider
class RealSpider(BaseSpider):
# ...
def parse(self, response):
yield FormRequest(..., callback=self.parse_after_filtering_results1)
yield FormRequest(..., callback=self.parse_after_filtering_results2)
If you have different types of spiders you can create different base classes. Or your base class can be a plain object (not Spider
) and then you can use it as a mixin.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.