简体   繁体   中英

Scrapy sends an request using the specified network card python 3

I have created one scrapy project it is working well, I wanted it to host on the server to run it daily and it is working, But my server has two Network Card one is specially added for scrapy, still project is working but I wanted to use only one Network Card for scrapy or python and that I can specify that this Network card Python or Scrapy can use.

Server: Windows 10
Python: 3.6
Scrapy: 1.5

I was looking for the solution and found this Python sends an HTTP request using the specified network card on the internet but actually, I did not understand how it can be used.

Please help me to solve this solution may be like assign Network Card to python or assign Network card to socket or core library that scrapy used to request the website.

I dig deep for the solution and I found that the scrapy itself provides the requests meta bindaddress attribute to specify the address through that binding process is done.

But it seems that scrapy documentation does not show how to use it but I came up with a download middleware that modifies the request and solves my problem and I called it BindAddressMiddleware .

What does the middleware do? It uses the settings

IS_MORE_NETWORK_CARDS = True the specific network card will be used if False then it won't

BIND_ADDRESS = 127.0.0.1 the IP of the network card to be used

use the download middleware for scrapy project in settings.py

DOWNLOADER_MIDDLEWARES = {
    # Bindaddress
    'scrapers22.middlewares.BindAddressMiddleware': 400,
}

the BindAddressMiddleware download middleware

class BindAddressMiddleware(object):
    def __init__(self, settings):
        self.is_bindaddress = settings.get('IS_MORE_NETWORK_CARDS')
        if self.is_bindaddress:
            self.bindaddress = settings.get('BIND_ADDRESS')

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.settings)

    def process_request(self, request, spider):
        if self.is_bindaddress:
            if self.bindaddress:
                request.meta['bindaddress'] = (self.bindaddress, 0)
        return None

    def spider_opened(self, spider):
        spider.logger.info('Using: %s as bindaddress' % self.bindaddress)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM