Python Multi Threading with Blocking I/O

Question

My application uses multiple I/O blocking (network) requests that take a while to complete. I tried using multi threading but it doesn't appear to bring any speedup, I'm guessing it's something to do with Python's GIL.

The thing is all of the requests can be done concurrently and have no dependencies on each other. How do I solve this performance issue?

My code

import threading
import urllib2
import time
def send_request(url, count_str):
    start_time = time.time()
    urllib2.urlopen(url)
    print "Request " + count_str + " took " + str(time.time() - start_time) + " started at " + str(start_time)

count = 0
for url in open('urllist.txt'):
    t = threading.Thread(target=send_request, args = (url.strip(), str(count)))
    t.start()
    count+=1

The output is

Request 1 took 5.0150949955 started at 1458789266.78
Request 2 took 10.0112490654 started at 1458789266.79
Request 0 took 15.024559021 started at 1458789266.78
Request 3 took 20.016972065 started at 1458789266.79

The urls in urllist.txt point to a server I'm running locally that take 5 seconds to respond. As you can see they all "start" at the same time, but they are blocking.

Answer 1

I cannot reproduce your problem (when testing against a handful of internet servers, each one repeated a few times, all requests are serviced in about the same time, no steadily increasing delays), but your new output points to a completely different issue: I suspect the "local server" you're using may not be multithreaded (or otherwise able to service multiple requests at once).

Your own output indicates the threads are launching in parallel, but requests are being serviced serially; if it was GIL handoff causing problems, I'd expect to see all of them delayed a bit (one thread would get some work done, then another would do some more, etc.), not each one running to completion before the next starts. This smacks of a problem on the server side, where the server is handling requests to completion before it services additional connections.

Taking a stab at psychic debugging, did you by any chance implement the five second request time by adding a sleep in the server code, possibly after accept returns, but before launching a thread to service it? Or just not use threading on the server at all?

Answer 2

Python threads are slow! Python has a GIL (Global Interpreter Lock) which uses a mutex to serialize access to internals. You might want to have a look at Jython which doesn't have a GIL and can fully exploit multiprocessor systems.

Python Multi Threading with Blocking I/O

Question

2 answers

solution1
2 ACCPTED 2016-03-24 03:56:08

solution2
-1 2016-03-24 02:11:44

Python Multi Threading with Blocking I/O

Question

2 answers

solution1 2 ACCPTED 2016-03-24 03:56:08

solution2 -1 2016-03-24 02:11:44

solution1
2 ACCPTED 2016-03-24 03:56:08

solution2
-1 2016-03-24 02:11:44