简体   繁体   中英

Python scrapy crawlspider x-forwarded-for header

My simple crawlspider is bellow. How can I add X-Forwarded-For to this crawler? The X-Forwarded-For should be for all pages which will be crawled.

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.http.request import Request

class MySpider(CrawlSpider):
    name = 'spidy'
    allowed_domains = ['website.com', 'www.website.com']
    start_urls = ['http://www.website.com/']
    rules = (
        Rule(LinkExtractor(allow=('/uk/', )), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        print(response.url)

PS I found a way to do it via settings.py but is there a way via the spider ? Thank you!

You can achieve this by using the process_request function in the Rule object as below

rules = (Rule(LinkExtractor(allow=('/uk/', )), callback='parse_item', follow=True, process_request='add_header'),)

def add_header(self, request, response):
   request.headers['X-Forwarded-For'] = 'the_header_value'
   return request

See the docs for further information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM