简体   繁体   中英

Asynchronous Web Requests in Java?

I am writing a simple web crawler in Java. I want it to be able to download as many pages per second as possible. Is there a package out there that makes doing asynchronous HTTP web requests easy in Java? I have used the HttpURLConnection but that is blocking. I also know there is something in Apache's HTTPCore NIO, but I am looking for something more lightweight. I tried using this package and I was getting better throughput using the HttpURLConnection on multiple threads.

Generally data intensive protocols tend to perform better in terms of a raw throughput with the classic blocking I/O compared than NIO as long as the number of threads is below 1000. At least that is certainly the case with the client side HTTP based on (likely imperfect and possibly biased) HTTP benchmark used by Apache HttpClient [1]

One may be much better off using a blocking HTTP client with threads as long as the number of threads is moderate (<250)

If you are absolutely sure you want a NIO based HTTP client I can recommend Jetty HTTP client which I personally consider the best asynchronous HTTP client at the moment.

[1] http://wiki.apache.org/HttpComponents/HttpClient3vsHttpClient4vsHttpCore

While this user wasn't asking the same question, you may find answers to his question useful: Asynchronous HTTP Client for Java

As a side-note, if you're going to download "as many pages per second as possible", you should bear in mind that crawlers can inadvertently grind a weak server to a halt. You should probably read up on "robots.txt" and the appropriate way of interpreting this file before you unleash your creation on anything outside of your own personal test setup.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM