简体   繁体   中英

Apache httpclient 4.3.3 - speed tuning

What I am trying to do is to collect a million urls on gigabit connection and the speed varies between 5MB/s and 12MB/s (Megabytes per second) which is much under the bandwidth maximum. The code I use:

    DnsResolver dnsResolver = new SystemDefaultDnsResolver();
    X509HostnameVerifier hostnameVerifier = new AllowAllHostnameVerifier();
    SSLContext sslcontext = SSLContexts.createSystemDefault();
    RedirectStrategy redirectStrategy = new LaxRedirectStrategy();

    HttpConnectionFactory<HttpRoute, ManagedHttpClientConnection> connFactory= = new ManagedHttpClientConnectionFactory(
                    new DefaultHttpRequestWriterFactory(),
                   new DefaultHttpResponseParserFactory());

    Registry<ConnectionSocketFactory> socketFactoryRegistry = RegistryBuilder
                        .<ConnectionSocketFactory> create()
                        .register(
                                "https",
                                new SSLConnectionSocketFactory(sslcontext,
                                        hostnameVerifier))
                        .register("http", new PlainConnectionSocketFactory())
                        .build();
    SocketConfig socketConfig = SocketConfig.custom().setSoKeepAlive(false)
                    .setSoReuseAddress(false)
                    .setSoTimeout(15000).build();
    PoolingHttpClientConnectionManager manager = new PoolingHttpClientConnectionManager(socketFactoryRegistry,connFactory, dnsResolver);
     manager.setDefaultSocketConfig(socketConfig);
     manager.setMaxTotal(1000);
    CloseableHttpClient httpClient = HttpClientBuilder.create().setUserAgent("Mozilla")
                    .setConnectionManager(manager)
                    .setRedirectStrategy(redirectStrategy)               
                    .setMaxConnPerRoute(-1).build();

    RequestConfig defaultConfig = RequestConfig.custom()
                    .setCookieSpec(CookieSpecs.IGNORE_COOKIES)
                    .setExpectContinueEnabled(false)
                    .setStaleConnectionCheckEnabled(false)
                    .setRedirectsEnabled(true)
                    .setStaleConnectionCheckEnabled(false)
                    .setMaxRedirects(5).build();

    RequestConfig rConfig= RequestConfig.copy(defaultConfig)
                    .setSocketTimeout(15000)
                    .setConnectionRequestTimeout(-1)
                    .setConnectTimeout(15000).build();

ExecutorService  executorService = Executors.newFixedThreadPool(640);

FutureRequestExecutionService service = new FutureRequestExecutionService(httpClient, executorService);

Per request configuration is:

 HttpGet httpget = new HttpGet("some url");
    httpget.setConfig(rConfig);
    httpget.setHeader("Connection", "close");

In ResponseHandler I use the following code to consume the content:

 stream = response.getEntity().getContent();
    final byte[] content = IOUtils.toByteArray(stream);

Each url is from different domain. The machine is with 8 cores and 8GB of RAM - 64 bit linux - Debian. How to speed up this ?

If you do not need automatic authentication, retries, cookie management and do not mind handling redirects manually, consider using minimal HttpClient implementation. Minimal HCs are built with a minimal execution pipeline consisting of mandatory protocol interceptors only and should have the best performance characteristics with the same concurrency parameters (connection pool setup).

PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
CloseableHttpClient hc = HttpClients.createMinimal(cm);

And naturally you should be wanting to re-use connection for optimal performance. This seems to go counter to what I would consider best practices.

httpget.setHeader("Connection", "close"); // Huh?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM