Python scrapy crawlspider x-forwarded-for header

Question

My simple crawlspider is bellow. How can I add X-Forwarded-For to this crawler? The X-Forwarded-For should be for all pages which will be crawled.

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.http.request import Request

class MySpider(CrawlSpider):
    name = 'spidy'
    allowed_domains = ['website.com', 'www.website.com']
    start_urls = ['http://www.website.com/']
    rules = (
        Rule(LinkExtractor(allow=('/uk/', )), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        print(response.url)

PS I found a way to do it via settings.py but is there a way via the spider ? Thank you!

Answer 1

You can achieve this by using the process_request function in the Rule object as below

rules = (Rule(LinkExtractor(allow=('/uk/', )), callback='parse_item', follow=True, process_request='add_header'),)

def add_header(self, request, response):
   request.headers['X-Forwarded-For'] = 'the_header_value'
   return request

See the docs for further information.

Python scrapy crawlspider x-forwarded-for header

Question

1 answers

solution1
0 2021-10-20 05:20:33

Python scrapy crawlspider x-forwarded-for header

Question

1 answers

solution1 0 2021-10-20 05:20:33

solution1
0 2021-10-20 05:20:33