Is there any method to get only the head section of a web page from server, so that the whole document need not be downloaded?. Though there is a Range option in HTTP headers but all servers do not support it. Moreover the size of head section is not same in every web page
You can create an InputStream
to the website you want to parse. Then use a BufferedInputStream
, and manually parse the content. If you think you are finished (eg reached </head>
), you can close the streams. This way you don't download all the HTML content.
Code:
InputStream is = new URL("http://www.website.com/").openStream();
BufferedInputStream bis = new BufferedInputStream(is);
Reader rdr = new InputStreamReader(bis);
boolean finished = false;
while(! finished) {
String line = rdr.readLine();
if (line.indexOf("</head>") >= 0) {
finished = true;
} else {
// parse or save the header
}
}
bis.close();
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.