简体   繁体   中英

How to set IP proxy from a given proxy pool?

I have been given a proxy pool link http://10.10.5.17:5009/proxy_pool that outputs the following:

{
    "msg": "success",
    "list": [
        "111.72.193.250:34621",
        "114.99.28.7:25995",
        "121.234.245.76:35513",
        "220.186.155.66:49366",
        "117.90.252.72:45037"
    ],
    "data": "114.99.28.7:25995"
}

These IPs change every few minutes. I'd like to know how to set this up in Scrapy.

I have seen tutorials showing how to add every single IP in settings.py and then call it in middlewares.py, but I cannot do it this way since I need to read IPs from the link (And they change rapidly).

import json
import random


def start_requests(self):
    proxy_request = scrapy.Request(url='http://10.10.5.17:5009/proxy_pool', callback=self.prepare_request)
    yield proxy_request


def prepare_request(self, response):
    target_url = 'XXX'
    proxy_response = json.loads(response.body_as_unicode())
    proxy_list = [proxy for proxy in proxy_response['list']]
    request = scrapy.Request(url=target_url, meta={'proxy': random.choice(proxy_list)}, callback=self.scrape)


def scrape(self, response):
...

You'll have to write your own downloader middleware that handles downloading the proxy list initialy, getting a new list every now and then, and assigning a random proxy from the current list to each request.

You should start by reading the documentation about downloader middlewares . Then, I recommend you find existing middlewares that deal with proxies (eg scrapy-rotating-proxies ) and learn from them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM