making multiple http request efficiently

Question

I want to make a few million http request to web service of the form- htp://(some ip)//{id}

I have the list of ids with me. Simple calculation has shown that my java code will take around 4-5 hours to get the data from the api The code is

URL getUrl = new URL("http url");
URLConnection conn = getUrl.openConnection();
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
StringBuffer sbGet = new StringBuffer();
String getline;
while ((getline = rd.readLine()) != null)
{
    sbGet.append(getline);
}
rd.close();
String getResponse = sbGet.toString();

Is there a way to more efficiently make such requests which will take less time

Answer 1

One way is to use an executor service with a fixed thread pool (the size depends how much the target HTTP service can handle) and bombard requests to the service in parallel. The Runnable would basically perform the steps you outlined in your sample code, btw.

Answer 2

You need to profile your code before you start optimizing it. Otherwise you may end up optimizing the wrong part. Depending on the results you obtain from profiling consider the following options.

Change the protocol to allow you to batch the requests
Issue multiple requests in parallel (use multiple threads or execute multiple processes in parallel; see this article )
Cache previous results to reduce the number of requests
Compress the request or the response
Persist the HTTP connection

Answer 3

Is there a way to more efficiently make such requests which will take less time?

Well you probably could run a small number of requests in parallel, but you are likely to saturate the server. Beyond a certain number of requests per second, the throughput is likely to degrade ...

To get past that limit, you will need to redesign the server and/or the server's web API. For instance:

Changing your web API to allows a client to fetch a number of objects in each request will reduce the request overheads.
Compression could help, but you are trading off network bandwidth for CPU time and/or latency. If you have a fast, end-to-end network then compression might actually slow things down.
Caching helps in general, but probably not in your use-case. (You are requesting each object just once ...)
Using persistent HTTP connections avoids the overhead of creating a new TCP/IP connection for each request, but I don't think you can't do this for HTTPS. (And that's a shame because HTTPS connection establishment is considerably more expensive.)

making multiple http request efficiently

Question

3 answers

solution1
1 2012-09-03 09:16:24

solution2
0 2012-09-03 09:20:17

solution3
0 2012-09-03 09:22:58

making multiple http request efficiently

Question

3 answers

solution1 1 2012-09-03 09:16:24

solution2 0 2012-09-03 09:20:17

solution3 0 2012-09-03 09:22:58

solution1
1 2012-09-03 09:16:24

solution2
0 2012-09-03 09:20:17

solution3
0 2012-09-03 09:22:58