简体   繁体   中英

Python requests, change IP address

I am coding a web scraper for the website with the following Python code:

import requests

def scrape(url):
    req = requests.get(url)
    with open('out.html', 'w') as f:
        f.write(req.text)

It works a few times but then an error HTML page is returned by the website (when I open my browser, I have a captcha to complete).

Is there a way to avoid this “ban” by for example changing the IP address?

As already mentioned in the comments and from yourself, changing the IP could help. To do this quite easily have a look at vpngate.py:

https://gist.github.com/Lazza/bbc15561b65c16db8ca8

An How to is provided at the link.

Have fun

You can use a proxy with the requests library. You can find some free proxies at a couple different websites like https://www.sslproxies.org/ and http://free-proxy.cz/en/proxylist/country/US/https/uptime/level3 but not all of them work and they should not be trusted with sensitive information.

example:

proxy = {
    "https": 'https://158.177.252.170:3128',
    "http": 'https://158.177.252.170:3128' 
}
response=requests.get('https://httpbin.org/ip', proxies=proxy)

I recently answered this on another question here , but using the requests-ip-rotator library to rotate IPs through API gateway is usually the most effective way.
It's free for the first million requests per region, and it means you won't have to give your data to unreliable proxy sites.

Late answer, I found this looking for IP-spoofing, but to the OP's question - as some comments point out, you may or may not actually be getting banned. Here's two things to consider:

  1. A soft ban: they don't like bots. Simple solution that's worked for me in the past is to add headers, so they think you're a browser, eg,

    req = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})

  2. On-page active elements, scripts or popups that act as content gates, not a ban per se - eg, country/language selector, cookie config, surveys, etc. requiring user input. Not-as-simple solution: use a webdriver like Selenium + chromedriver to render the page including JS and then add "user" clicks to deal with the problems.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM