简体   繁体   中英

How to get only the head section of a web page from server in java

Is there any method to get only the head section of a web page from server, so that the whole document need not be downloaded?. Though there is a Range option in HTTP headers but all servers do not support it. Moreover the size of head section is not same in every web page

You can create an InputStream to the website you want to parse. Then use a BufferedInputStream , and manually parse the content. If you think you are finished (eg reached </head> ), you can close the streams. This way you don't download all the HTML content.

Code:

InputStream is = new URL("http://www.website.com/").openStream();
BufferedInputStream bis = new BufferedInputStream(is);
Reader rdr = new InputStreamReader(bis);

boolean finished = false;
while(! finished) {
    String line = rdr.readLine();
    if (line.indexOf("</head>") >= 0) {
        finished = true;
    } else {
         // parse or save the header
    }
}
bis.close();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM