Really what I'm wondering: is python's urllib2 more like java's HttpUrlConnection
, or more like apache's HttpClient
? And, ultimately I'm wondering if urllib2 scales when used in a http server, or if there is some alternate library that is used when performance is an issue (as is the case in the java world).
To expand on my question a bit:
Java's HttpUrlConnection internally holds one connection open per host, and does pipelining. So if you do the following concurrently across threads it won't perform well:
HttpUrlConnection cxn = new Url('www.google.com').openConnection();
InputStream is = cxn.getInputStream();
By comparison, apache's HttpClient can be initialized with a connection pool, like this:
// this instance can be a singleton and shared across threads safely:
HttpClient client = new HttpClient();
MultiThreadedHttpConnectionManager cm = new MultiThreadedHttpConnectionManager();
HttpConnectionManagerParams p = new HttpConnectionManagerParams();
p.setMaxConnectionsPerHost(HostConfiguration.ANY_HOST_CONFIGURATION,20);
p.setMaxTotalConnections(100);
p.setConnectionTimeout(100);
p.setSoTimeout(250);
cm.setParams(p);
client.setHttpConnectionManager(cm);
The important part in the example above being that the number of total connections and the per-host connections are configurable.
In a comment urllib3 was mentioned, but I can't tell from reading the docs if it allows a per-host max to be set.
As of Python 2.7.14rc1, No.
For urllib
, urlopen()
eventually calls httplib.HTTP
, which creates a new instance of HTTPConnection
. HTTPConnection
is tied to a socket and has methods for opening and closing it.
For urllib2
, HTTPHandler
does something similar and creates a new instance of HTTPConnection
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.