简体   繁体   中英

Java socket read blocking infinitely

I have a really strange issue while working with Java sockets. This problem is only happening for a VERY small subset of the urls that I am processing. Let's call an example url abc.com.

Edit: url is lists.wikimedia.org/robots.txt that gives me problems.

I can curl/netcat/telnet lists.wikimedia.org with path /robots.txt perfectly fine. Telnet even tells me the IP address for lists.wikimedia.org (see below). However, when I try to do the same using Java socket like the following:

Socket s = new Socket("208.80.154.4", 80);  // IP is same as the IP printed by telnet
BufferedWriter writer = new BufferedWriter(s.getOutputStream());
writer.println("HEAD /robots.txt HTTP/1.1");
writer.println("Host: lists.wikimedia.org");
writer.println("Connection: Keep-Alive");
writer.flush();

InputStreamReader r = new InputStreamReader(s.getInputStream());
BufferedReader reader = new BufferedReader(r);

String line;
while ((line = reader.readLine()) != null) {
    ...
}

The readLine blocks infinitely until the socket times out...

Does anyone have ANY idea why this might be happening? The same code works fine with most of the other URLs, and interestingly enough this bug only happens for some of the ROBOTS.TXT requests... I'm so confused why this might be happening.

Edit:

Interestingly enough, using apache HttpClient library gives me the correct result for lists.wikimedia.org/robots.txt . Is there something else I need to do if I want to manually do it via Socket?

Probably you are missing the additional CRLF to end the HTTP request header. I also would write them explicitly, to avoid platform confusions, like so (untested):

writer.print("HEAD /robots.txt HTTP/1.1\r\n");
writer.print("Host: lists.wikimedia.org\r\n");
writer.print("Connection: Keep-Alive\r\n");
writer.print("\r\n");
writer.flush();

also consider using a HTTPURLConnection instead of plain sockets, takes away all this burdons:

HttpURLConnection connection = (HttpURLConnection) new URL(url).openConnection();
connection.setRequestMethod("HEAD");
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM