Using Google App Engine, I'm making request like this:
URLFetchService service = URLFetchServiceFactory.getURLFetchService();
HTTPResponse response = service.fetch(request);
To detect if it returns HTML or not, I'm just stringifying the response and looking for the presence of HTML tags.
String responseAsString = new String(response.getContent());
if (responseAsString.contains("<html>")){
// is html
}
What would be a better way to detect if it's HTML or not?
Also, the input url's are not necessarily descriptive like example.com/page.html - the problem is they might be like example.com/mystery
HTTPResponse response = URLFetchServiceFactory.getURLFetchService()
.fetch(new URL("url_to_fetch"));
List<HTTPHeader> headers = response.getHeaders();
for (HTTPHeader h : headers) {
if (h.getName().equals("Content-Type")) {
/*
* could be text/html; charset=iso-8859-1.
*/
if (h.getValue().startsWith("text/html")) {
/* TODO do sth. */
}
}
}
Also you can check other MIME Types .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.