What I want is in title. The backgroud is I have thousands of requests to send to a very slow Restful interface in the program where all 3rd party packages are not allowed to imported into, except requests .
The speed of MULTITHREADING AND MULTIPROCESSING is limited to GIL and the 4 cores computer in which the program will be run.
I know you can implement an incomplete coroutine in Python 2.7 by generator and yield key word, but how can I make it possible to do thousands of requests with the incomplete coroutine ability?
url_list = ["https://www.example.com/rest?id={}".format(num) for num in range(10000)]
results = request_all(url_list) # do asynchronously
First, you're starting from an incorrect premise.
The reason people use async
is that not that it solves any of these problems; in fact, it only makes them worse. The main advantage is that if you have a ton of workers that are doing almost no work, you can schedule a ton of waiting-around coroutines more cheaply than a ton of waiting-around threads or processes. The secondary advantage is that you can tie the selector loop to the scheduler loop and eliminate a bit of overhead coordinating them.
Second, you can't use requests
with asyncio
in the first place. It expects to be able to block the whole thread on socket reads. There was a project to rewrite it around an asyncio
-based transport adapter, but it was abandoned unfinished.
The usual way around that is to use it in threads, eg, with run_in_executor
. But if the only thing you're doing is requests
, building an event loop just to dispatch things to a thread pool executor is silly; just use the executor directly.
Third, I doubt you actually need to have thousands of requests running in parallel. Although of course the details depend on your service or your network or whatever the bottleneck is, it's almost always more efficient to have a thread pool that can run, say, 12 or 64 requests running in parallel, with the other thousands queued up behind them.
Handling thousands of concurrent connections (and therefore workers) is usually something you only have to do on a server. Occasionally you have to do it on a client that's aggregating data from a huge number of different services. But if you're just hitting a single service, there's almost never any benefit to that much concurrency.
Fourth, if you really do want a coroutine-based event loop in Python 2, by far the easiest way is to use gevent
or greenlets
or another such library.
Yes, they give you an event loop hidden under the covers where you can't see it, and "magic" coroutines where the yielding happens inside methods like socket.send
and Thread.join
instead of being explicitly visible with await
or yield from
, but the plus side is that they already work—and, in fact, the magic means they work with requests
, which anything you build will not.
Of course you don't want to use any third-party libraries. Building something just like greenlets
yourself on top of Stackless or PyPy is pretty easy; building it for CPython is a lot more work. And then you still have to do all the monkeypatching that gevent
does to make libraries like sockets
work like magic, or rewrite requests
around explicit greenlets.
Anyway, if you really want to build an event loop on top of just plain yield
, you can.
In Greg Ewing's original papers on why Python needed to add yield from
, he included examples of a coroutine event loop with just yield
, and a better one that uses an explicit trampoline to yield
to—with a simple networking-driven example. He even wrote an automatic translator from code for the (at the time not implemented) yield from
to Python 3.1.
Notice that having to bounce every yield off a trampoline makes things a lot less efficient. There's really no way around that. That's a good part of the reason we have yield from
in the language.
But that's just the scheduler part with a bit of toy networking. You still need to integrate a selectors
loop and then write coroutines to replace all of the socket
functions you need. Consider how long asyncio
took Guido to build when he knew Python inside and out and had yield from
to work with… but then you can steal most of his design, so it won't be quite that bad. Still, it's going to be a lot of work.
(Oh, and you don't have selectors
in Python 2. If you don't care about Windows, it's pretty easy to build the part you need out of the select
module, but if you do care about Windows, it's a lot more work.)
And remember, because requests
won't work with your code, you're also going to need to reimplement most of it as well. Or, maybe better, port aiohttp
from asyncio
to your framework.
And, in the end, I'd be willing to give you odds that the result is not going to be anywhere near as efficient as aiohttp
in Python 3, or requests
on top of gevent
in Python 2, or just requests
on top of a thread pool in either.
And, of course, you'll be the only person in the world using it. asyncio
had hundreds of bugs to fix between tulip
and going into the stdlib, which were only detected because dozens of early adopters (including people who are serious experts on this kind of thing) were hammering on it. And requests
, aiohttp
, gevent
, etc. are all used by thousands of servers handling zillions of dollars worth of business, so you benefit from all of those people finding bugs and needing fixes. Whatever you build almost certainly won't be nearly as reliable as any of those solutions.
All this for something you're probably going to need to port to Python 3 anyway, since Python 2 hits end-of-life in less than a year and a half, and distros and third-party libraries are already disengaging from it. For a relevant example, requests
3.0 is going to require at least Python 3.5; if you want to stick with Python 2.7, you'll be stuck with requests
2.1 forever.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.