How to get only the head section of a web page from server in java

Question

Is there any method to get only the head section of a web page from server, so that the whole document need not be downloaded?. Though there is a Range option in HTTP headers but all servers do not support it. Moreover the size of head section is not same in every web page

Answer 1

You can create an InputStream to the website you want to parse. Then use a BufferedInputStream , and manually parse the content. If you think you are finished (eg reached </head> ), you can close the streams. This way you don't download all the HTML content.

Code:

InputStream is = new URL("http://www.website.com/").openStream();
BufferedInputStream bis = new BufferedInputStream(is);
Reader rdr = new InputStreamReader(bis);

boolean finished = false;
while(! finished) {
    String line = rdr.readLine();
    if (line.indexOf("</head>") >= 0) {
        finished = true;
    } else {
         // parse or save the header
    }
}
bis.close();

How to get only the head section of a web page from server in java

Question

1 answers

solution1
2 2015-03-26 16:59:55

How to get only the head section of a web page from server in java

Question

1 answers

solution1 2 2015-03-26 16:59:55

solution1
2 2015-03-26 16:59:55