简体   繁体   中英

AsyncHTTPClient blocking my Tornado IOLoop

how are you?

I've been through this trouble the last days, and I seem to not being able to completely understand the tornado gen library.

I have this piece of code, as an example:

@gen.coroutine
def get(self, build_id=None):
    status_query = self.get_query_arguments("status")
    limit_query = self.get_query_arguments("limit")

    results = [self._dummy() for i in range(15)]
    yield results

def _dummy(self):
    http_client = tornado.httpclient.AsyncHTTPClient()
    return http_client.fetch("https://www.google.com", headers=self.headers, validate_cert=False)

As I thought, my 15 requests to fetch google should be triggering almost at the same time. The "results" list, should be a list of futures, and then, yielding the list should wait for all of them to be completed.

That's actually happening, but it's taking around 6 seconds to make those requests, and it's growing incrementally as I increase the range of the for loop.

Shouldn't they take around the same time to be ready?

Am I missing something?

Thank you very much!

AsyncHTTPClient's default max_clients is 10. When you initiate 15 requests, 10 of them begin immediately, but the remaining 5 must wait for other requests to finish before they can begin. To begin more concurrent requests, raise max_clients to a larger number. See Tornado's documentation for details on configuring AsyncHTTPClient.

If your requests aren't IO bound then you won't see much change. - Me

In programming these are the primary limits that we have:

  • CPU (number of calculations that can happen per second)
  • Cache access in the processor
  • RAM access
  • Disk Access
  • Network access

In Python, we're even further limited when it comes to CPU access because of the GIL. With modern computers that tend towards multiple cores - 2, 4, 8, or 16 - we're crippled 1 even further because typically each of those processors are going to be a bit slower. For more information about the GIL, check out David Beazley's GIL talk and Larry Hasting's GIL-ectomy .

To bypass the Global Interpreter Lock, there have been several callback-style modules developed, like Twisted, Tornado, and asyncio. The way these work is by performing some operations an typically yielding up control when they reach a point where the IO stops.

For an example, if I'm writing data to a spinning disk perhaps I can write 100kb to the disk, but while I'm waiting for all of that information to be written, perhaps I can go off and do 1,000 calculations before all of the data finishes writing.

Alternatively, perhaps I can make 100 requests per second to a webservice, but it only takes me 0.0001s to perform my calculations for each of those requests. If you look at a graph of where I spend my time it's going to look something like this:

    #            
    #            
    #            
    #            
    #            
    #            
    #            
    #           #
--------------------------
 reading    processing

What these processes allow you to do is interleave the processing and the reading/writing, by sending the request packets off, then doing something else, and then at some point coming back to read the returned packets.

Being IO bound like this, you can see a pretty massive speedup, because rather than looking something like this:

 start    end start     end
--|--------|----|--------|------
 t=0      t=5  t=6      t=11

You can get something like this:

     start      end
 start|     end  |
--|---|------|---|-
 t=0 t=1    t=5  t=6

But if your process is CPU bound you're not going to see any of that speedup (or at least not much), because you're spending 30s doing processing and only 1s doing any kind of waiting for the network.

Before trying the async approach, give the standard single-threaded approach and see 1) if it's fast enough and 2) if it's slow at the network/IO boundary.

You can easily use something like the line profiler for Python, and (if not already) separate out your read, process, and write functions and see where you're spending the time. If you're spending most of the time in the read functions then yeah, you should see a pretty reasonable speedup from the async approach. If not, async is just going to slow you down.

1 Honestly it's not really that bad, unless you have something super speed critical. And then you should be using the cffi or something to take the speed critical sections and dump them into C. You did figure out which sections are the hold-up, right?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM