Asynchronous HTTP calls in Python

Question

I have a need for a callback kind of functionality in Python where I am sending a request to a webservice multiple times, with a change in the parameter each time. I want these requests to happen concurrently instead of sequentially, so I want the function to be called asynchronously.

It looks like asyncore is what I might want to use, but the examples I've seen of how it works all look like overkill, so I'm wondering if there's another path I should be going down. Any suggestions on modules/process? Ideally I'd like to use these in a procedural fashion instead of creating classes but I may not be able to get around that.

Answer 1

Starting in Python 3.2, you can use concurrent.futures for launching parallel tasks.

Check out this ThreadPoolExecutor example:

http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example

It spawns threads to retrieve HTML and acts on responses as they are received.

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

The above example uses threading. There is also a similar ProcessPoolExecutor that uses a pool of processes, rather than threads:

http://docs.python.org/dev/library/concurrent.futures.html#processpoolexecutor-example

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

Answer 2

Do you know about eventlet ? It lets you write what appears to be synchronous code, but have it operate asynchronously over the network.

Here's an example of a super minimal crawler:

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

def fetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print "got body", len(body)

Answer 3

Twisted framework is just the ticket for that. But if you don't want to take that on you might also use pycurl , wrapper for libcurl, that has its own async event loop and supports callbacks.

Answer 4

(Although this thread is about server-side Python. Since this question was asked a while back. Others might stumble on this where they are looking for a similar answer on the client side)

For a client side solution, you might want to take a look at Async.js library especially the "Control-Flow" section.

https://github.com/caolan/async#control-flow

By combining the "Parallel" with a "Waterfall" you can achieve your desired result.

WaterFall( Parallel(TaskA, TaskB, TaskC) -> PostParallelTask)

If you examine the example under Control-Flow - "Auto" they give you an example of the above: https://github.com/caolan/async#autotasks-callback where "write-file" depends on "get_data" and "make_folder" and "email_link" depends on write-file".

Please note that all of this happens on the client side (unless you're doing Node.JS - on the server-side)

For server-side Python, look at PyCURL @ https://github.com/pycurl/pycurl/blob/master/examples/basicfirst.py

By combining the example below with pyCurl, you can achieve the non-blocking multi-threaded functionality.

Asynchronous HTTP calls in Python

Question

4 answers

solution1
18 2011-02-10 23:32:20

solution2
16 2011-02-11 18:54:03

solution3
8 ACCPTED 2011-02-10 21:55:31

solution4
-1 2014-02-06 20:57:21

Asynchronous HTTP calls in Python

Question

4 answers

solution1 18 2011-02-10 23:32:20

solution2 16 2011-02-11 18:54:03

solution3 8 ACCPTED 2011-02-10 21:55:31

solution4 -1 2014-02-06 20:57:21

solution1
18 2011-02-10 23:32:20

solution2
16 2011-02-11 18:54:03

solution3
8 ACCPTED 2011-02-10 21:55:31

solution4
-1 2014-02-06 20:57:21