I have created one scrapy project it is working well, I wanted it to host on the server to run it daily and it is working, But my server has two Network Card one is specially added for scrapy, still project is working but I wanted to use only one Network Card for scrapy or python and that I can specify that this Network card Python or Scrapy can use.
Server: Windows 10
Python: 3.6
Scrapy: 1.5
I was looking for the solution and found this Python sends an HTTP request using the specified network card on the internet but actually, I did not understand how it can be used.
Please help me to solve this solution may be like assign Network Card to python or assign Network card to socket or core library that scrapy used to request the website.
I dig deep for the solution and I found that the scrapy itself provides the requests meta bindaddress
attribute to specify the address through that binding process is done.
But it seems that scrapy documentation does not show how to use it but I came up with a download middleware that modifies the request and solves my problem and I called it BindAddressMiddleware
.
What does the middleware do? It uses the settings
IS_MORE_NETWORK_CARDS = True
the specific network card will be used if False
then it won't
BIND_ADDRESS = 127.0.0.1
the IP of the network card to be used
use the download middleware for scrapy project in settings.py
DOWNLOADER_MIDDLEWARES = {
# Bindaddress
'scrapers22.middlewares.BindAddressMiddleware': 400,
}
the BindAddressMiddleware
download middleware
class BindAddressMiddleware(object):
def __init__(self, settings):
self.is_bindaddress = settings.get('IS_MORE_NETWORK_CARDS')
if self.is_bindaddress:
self.bindaddress = settings.get('BIND_ADDRESS')
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.settings)
def process_request(self, request, spider):
if self.is_bindaddress:
if self.bindaddress:
request.meta['bindaddress'] = (self.bindaddress, 0)
return None
def spider_opened(self, spider):
spider.logger.info('Using: %s as bindaddress' % self.bindaddress)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.