How to check last modified time of a pdf file on a website using jsoup

Question

I want to check last modified time of a pdf file on a particular page. The pdf link is http://www.nfib.com/Portals/0/PDF/sbet/sbet201402.pdf

I am trying to do this :

 Connection.Response rs2 = Jsoup.connect("http://www.nfib.com/Portals/0/PDF/sbet/sbet201402.pdf").execute();
    System.out.println("Header = " + rs2.header("Last-Modified"));

I get this error

UnsupportedMimeTypeException

Answer 1

If it doesn't have to be done with Jsoup you can just use standard URL and URLConnection classes like

URL url = new URL("http://www.nfib.com/Portals/0/PDF/sbet/sbet201402.pdf");
URLConnection connection = url.openConnection();
System.out.println("Header = " + connection.getHeaderField("Last-Modified"));

You need to remember that Jsoup was designed to parse HTML/XML, so by default it requires types of

text/*, application/xml, or application/xhtml+xml

not

application/pdf .

If you take a look at code which handles it, it looks like

if (contentType != null && !req.ignoreContentType() && (!(contentType.startsWith("text/") || contentType.startsWith("application/xml") || contentType.startsWith("application/xhtml+xml"))))
    throw new UnsupportedMimeTypeException("Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml",
            contentType, req.url().toString());

But !req.ignoreContentType() test gives us hint that we can turn of requirements or purely XML/HTML type input. To do so you can just add

ignoreContentType(true)

to your connection settings, like

Connection.Response rs2 = Jsoup.connect("http://www.nfib.com/Portals/0/PDF/sbet/sbet201402.pdf")
        .ignoreContentType(true)
        .execute();

and you should be able to read returned headers

System.out.println("Header = " + rs2.header("Last-Modified"));

output:

Header = Mon, 10 Feb 2014 22:54:15 GMT

How to check last modified time of a pdf file on a website using jsoup

Question

1 answers

solution1
2 ACCPTED 2014-03-11 11:09:04

How to check last modified time of a pdf file on a website using jsoup

Question

1 answers

solution1 2 ACCPTED 2014-03-11 11:09:04

solution1
2 ACCPTED 2014-03-11 11:09:04