简体   繁体   中英

Java/GAE - Detect if HTTP Response content is HTML?

Using Google App Engine, I'm making request like this:

URLFetchService service = URLFetchServiceFactory.getURLFetchService();
HTTPResponse response = service.fetch(request);

To detect if it returns HTML or not, I'm just stringifying the response and looking for the presence of HTML tags.

String responseAsString = new String(response.getContent());

if (responseAsString.contains("<html>")){
    // is html
}

What would be a better way to detect if it's HTML or not?

Also, the input url's are not necessarily descriptive like example.com/page.html - the problem is they might be like example.com/mystery

HTTPResponse response = URLFetchServiceFactory.getURLFetchService()
            .fetch(new URL("url_to_fetch"));
List<HTTPHeader> headers = response.getHeaders();

for (HTTPHeader h : headers) {
    if (h.getName().equals("Content-Type")) {
        /*
        * could be text/html; charset=iso-8859-1.
        */
        if (h.getValue().startsWith("text/html")) {
            /* TODO do sth. */
        }
    }
}

https://developers.google.com/appengine/docs/java/javadoc/com/google/appengine/api/urlfetch/HTTPResponse#getHeaders()

Also you can check other MIME Types .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM