Python mechanize returns HTTP 429 error

Question

I am trying to do an automated task via python through the mechanize module:

Enter the keyword in a web form, submit the form.
Look for a specific element in the response.

This works one-time. Now, I repeat this task for a list of keywords.

And am getting HTTP Error 429 (Too many requests).

I tried the following to workaround this:

Adding custom headers (I noted them down specifically for that very website by using a proxy ) so that it looks a legit browser request .

 br=mechanize.Browser() br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36')] br.addheaders = [('Connection', 'keep-alive')] br.addheaders = [('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')] br.addheaders = [('Upgrade-Insecure-Requests','1')] br.addheaders = [('Accept-Encoding',' gzip, deflate, sdch')] br.addheaders = [('Accept-Language','en-US,en;q=0.8')]`

Since the blocked response was coming for every 5th request , I tried sleeping for 20 sec after 5 requests .

Neither of the two methods worked.

Answer 1

You need to limit the rate of your requests to conform to what the server's configuration permits. ( Web Scraper: Limit to Requests Per Minute/Hour on Single Domain? may show the permitted rate)

mechanize uses a heavily-patched version of urllib2 ( Lib/site-packages/mechanize/_urllib2.py ) for network operations, and its Browser class is a descendant of its _urllib2_fork.OpenerDirector .

So, the simplest method to patch its logic seems to add a handler to your Browser object

with default_open and appropriate handler_order to place it before everyone (lower is higher priority).
that would stall until the request is eligible with eg a Token bucket or Leaky bucket algorithm eg as implemented in Throttling with urllib2 . Note that a bucket should probably be per-domain or per-IP.
and finally return None to push the request to the following handlers

Since this is a common need, you should probably publish your implementation as an installable package.

Python mechanize returns HTTP 429 error

Question

1 answers

solution1
0 2015-08-17 09:16:30

Python mechanize returns HTTP 429 error

Question

1 answers

solution1 0 2015-08-17 09:16:30

solution1
0 2015-08-17 09:16:30