简体   繁体   中英

How can I implement async requests without 3rd party packages in Python 2.7

What I want is in title. The backgroud is I have thousands of requests to send to a very slow Restful interface in the program where all 3rd party packages are not allowed to imported into, except requests .

The speed of MULTITHREADING AND MULTIPROCESSING is limited to GIL and the 4 cores computer in which the program will be run.

I know you can implement an incomplete coroutine in Python 2.7 by generator and yield key word, but how can I make it possible to do thousands of requests with the incomplete coroutine ability?

Example

url_list = ["https://www.example.com/rest?id={}".format(num) for num in range(10000)]
results = request_all(url_list) # do asynchronously

First, you're starting from an incorrect premise.

  • The speed of multiprocessing is not limited by the GIL at all.
  • The speed of multiprocessing is only limited by the number of cores for CPU-bound work, which yours is not. And async doesn't work at all for CPU-bound work, so multiprocessing would be 4x better than async, not worse.
  • The speed of multithreading is only limited by the GIL for CPU-bound code, which, again, yours is not.
  • The speed of multithreading is barely affected by the number of cores. If your code is CPU-bound, the threads mostly end up serialized on a single core. But again, async is even worse here, not better.

The reason people use async is that not that it solves any of these problems; in fact, it only makes them worse. The main advantage is that if you have a ton of workers that are doing almost no work, you can schedule a ton of waiting-around coroutines more cheaply than a ton of waiting-around threads or processes. The secondary advantage is that you can tie the selector loop to the scheduler loop and eliminate a bit of overhead coordinating them.


Second, you can't use requests with asyncio in the first place. It expects to be able to block the whole thread on socket reads. There was a project to rewrite it around an asyncio -based transport adapter, but it was abandoned unfinished.

The usual way around that is to use it in threads, eg, with run_in_executor . But if the only thing you're doing is requests , building an event loop just to dispatch things to a thread pool executor is silly; just use the executor directly.


Third, I doubt you actually need to have thousands of requests running in parallel. Although of course the details depend on your service or your network or whatever the bottleneck is, it's almost always more efficient to have a thread pool that can run, say, 12 or 64 requests running in parallel, with the other thousands queued up behind them.

Handling thousands of concurrent connections (and therefore workers) is usually something you only have to do on a server. Occasionally you have to do it on a client that's aggregating data from a huge number of different services. But if you're just hitting a single service, there's almost never any benefit to that much concurrency.


Fourth, if you really do want a coroutine-based event loop in Python 2, by far the easiest way is to use gevent or greenlets or another such library.

Yes, they give you an event loop hidden under the covers where you can't see it, and "magic" coroutines where the yielding happens inside methods like socket.send and Thread.join instead of being explicitly visible with await or yield from , but the plus side is that they already work—and, in fact, the magic means they work with requests , which anything you build will not.

Of course you don't want to use any third-party libraries. Building something just like greenlets yourself on top of Stackless or PyPy is pretty easy; building it for CPython is a lot more work. And then you still have to do all the monkeypatching that gevent does to make libraries like sockets work like magic, or rewrite requests around explicit greenlets.


Anyway, if you really want to build an event loop on top of just plain yield , you can.

In Greg Ewing's original papers on why Python needed to add yield from , he included examples of a coroutine event loop with just yield , and a better one that uses an explicit trampoline to yield to—with a simple networking-driven example. He even wrote an automatic translator from code for the (at the time not implemented) yield from to Python 3.1.

Notice that having to bounce every yield off a trampoline makes things a lot less efficient. There's really no way around that. That's a good part of the reason we have yield from in the language.

But that's just the scheduler part with a bit of toy networking. You still need to integrate a selectors loop and then write coroutines to replace all of the socket functions you need. Consider how long asyncio took Guido to build when he knew Python inside and out and had yield from to work with… but then you can steal most of his design, so it won't be quite that bad. Still, it's going to be a lot of work.

(Oh, and you don't have selectors in Python 2. If you don't care about Windows, it's pretty easy to build the part you need out of the select module, but if you do care about Windows, it's a lot more work.)

And remember, because requests won't work with your code, you're also going to need to reimplement most of it as well. Or, maybe better, port aiohttp from asyncio to your framework.

And, in the end, I'd be willing to give you odds that the result is not going to be anywhere near as efficient as aiohttp in Python 3, or requests on top of gevent in Python 2, or just requests on top of a thread pool in either.

And, of course, you'll be the only person in the world using it. asyncio had hundreds of bugs to fix between tulip and going into the stdlib, which were only detected because dozens of early adopters (including people who are serious experts on this kind of thing) were hammering on it. And requests , aiohttp , gevent , etc. are all used by thousands of servers handling zillions of dollars worth of business, so you benefit from all of those people finding bugs and needing fixes. Whatever you build almost certainly won't be nearly as reliable as any of those solutions.

All this for something you're probably going to need to port to Python 3 anyway, since Python 2 hits end-of-life in less than a year and a half, and distros and third-party libraries are already disengaging from it. For a relevant example, requests 3.0 is going to require at least Python 3.5; if you want to stick with Python 2.7, you'll be stuck with requests 2.1 forever.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM