What I am trying to do is to collect a million urls on gigabit connection and the speed varies between 5MB/s and 12MB/s (Megabytes per second) which is much under the bandwidth maximum. The code I use:
DnsResolver dnsResolver = new SystemDefaultDnsResolver();
X509HostnameVerifier hostnameVerifier = new AllowAllHostnameVerifier();
SSLContext sslcontext = SSLContexts.createSystemDefault();
RedirectStrategy redirectStrategy = new LaxRedirectStrategy();
HttpConnectionFactory<HttpRoute, ManagedHttpClientConnection> connFactory= = new ManagedHttpClientConnectionFactory(
new DefaultHttpRequestWriterFactory(),
new DefaultHttpResponseParserFactory());
Registry<ConnectionSocketFactory> socketFactoryRegistry = RegistryBuilder
.<ConnectionSocketFactory> create()
.register(
"https",
new SSLConnectionSocketFactory(sslcontext,
hostnameVerifier))
.register("http", new PlainConnectionSocketFactory())
.build();
SocketConfig socketConfig = SocketConfig.custom().setSoKeepAlive(false)
.setSoReuseAddress(false)
.setSoTimeout(15000).build();
PoolingHttpClientConnectionManager manager = new PoolingHttpClientConnectionManager(socketFactoryRegistry,connFactory, dnsResolver);
manager.setDefaultSocketConfig(socketConfig);
manager.setMaxTotal(1000);
CloseableHttpClient httpClient = HttpClientBuilder.create().setUserAgent("Mozilla")
.setConnectionManager(manager)
.setRedirectStrategy(redirectStrategy)
.setMaxConnPerRoute(-1).build();
RequestConfig defaultConfig = RequestConfig.custom()
.setCookieSpec(CookieSpecs.IGNORE_COOKIES)
.setExpectContinueEnabled(false)
.setStaleConnectionCheckEnabled(false)
.setRedirectsEnabled(true)
.setStaleConnectionCheckEnabled(false)
.setMaxRedirects(5).build();
RequestConfig rConfig= RequestConfig.copy(defaultConfig)
.setSocketTimeout(15000)
.setConnectionRequestTimeout(-1)
.setConnectTimeout(15000).build();
ExecutorService executorService = Executors.newFixedThreadPool(640);
FutureRequestExecutionService service = new FutureRequestExecutionService(httpClient, executorService);
Per request configuration is:
HttpGet httpget = new HttpGet("some url");
httpget.setConfig(rConfig);
httpget.setHeader("Connection", "close");
In ResponseHandler I use the following code to consume the content:
stream = response.getEntity().getContent();
final byte[] content = IOUtils.toByteArray(stream);
Each url is from different domain. The machine is with 8 cores and 8GB of RAM - 64 bit linux - Debian. How to speed up this ?
If you do not need automatic authentication, retries, cookie management and do not mind handling redirects manually, consider using minimal HttpClient implementation. Minimal HCs are built with a minimal execution pipeline consisting of mandatory protocol interceptors only and should have the best performance characteristics with the same concurrency parameters (connection pool setup).
PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
CloseableHttpClient hc = HttpClients.createMinimal(cm);
And naturally you should be wanting to re-use connection for optimal performance. This seems to go counter to what I would consider best practices.
httpget.setHeader("Connection", "close"); // Huh?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.