简体   繁体   English

获取URL的上次修改日期

[英]Get the Last Modified date of an URL

I have three code. 我有三个代码。 This is the first one in which I get the metadata information of any url and in that metadata I have LastModified date also. 这是第一个获取任何URL的元数据信息的元素,在元数据中我也有LastModified日期。 If I run this class then I get last modified date of url as-- 如果我运行这个类,那么我得到url的最后修改日期为 -

key:- Last-Modified
value:- 2011-10-21T03:18:28Z

First one 第一

public class App {

    private static Map<String, String> metaData;    

public static void main(String[] args) {

        Tika t = new Tika();

        Metadata md = new Metadata();
        URL u = null;
        try {
            u = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");

            String content1= t.parseToString(u);
            System.out.println("hello" +content1);
        } catch (MalformedURLException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (TikaException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        try {
            Reader r = t.parse(u.openStream(), md);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        try {
        for (String name : md.names()){
            String value = md.get(name);
            System.out.println("key:- " +name);
            System.out.println("value:- " +value);
            //getMetaData().put(name.toLowerCase(), md.get(name));
        }
        }
        catch(Exception e) {
            e.printStackTrace();
        }

    }

}

But for second example just below this when I run this code and with the same url. 但是对于第二个例子,当我运行此代码并使用相同的url时。 I get different Last Modified date of that URL. 我得到该URL的上次修改日期。 How to make sure which one is right. 如何确定哪一个是正确的。 As I tried opening that pdf in the browser but instead of getting open in the browser. 当我尝试在浏览器中打开pdf而不是在浏览器中打开时。 it is getting open with Adobe PDF on the computer not on the browser so I am not able to check through firebug. 它是在计算机上而不是在浏览器上使用Adobe PDF打开的,因此我无法通过firebug进行检查。

Second Way-- 第二种方式 -

public class LastMod{
  public static void main(String args[]) throws Exception {
    URL url = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");

    System.out.println("URL:- " +url);
    URLConnection connection = url.openConnection();


    System.out.println(connection.getHeaderField("Last-Modified"));
    }
}

For the above one I get Las Mod date as- 对于上面的一个我得到Las Mod日期 -

Thu, 03 Nov 2011 16:59:41 +0000

Third Way-- 第三种方式 -

public class Main{
  public static void main(String args[]) throws Exception {
    URL url = new URL("http://www.xyz.com/documents/files/xyz-china.pdf");
    HttpURLConnection httpCon = (HttpURLConnection) url.openConnection();

    long date = httpCon.getLastModified();
    if (date == 0)
      System.out.println("No last-modified information.");
    else
      System.out.println("Last-Modified: " + new Date(date));

 }
}

And by third method I get it like this-- 通过第三种方法,我得到它 -

Last-Modified: Thu Nov 03 09:59:41 PDT 2011

I am confuse which one is right. 我很困惑哪一个是对的。 I think first one is right. 我认为第一个是正确的。 Any suggestions will be appreciated.. 任何建议将不胜感激..

The best option is the third one - connection.getLastModified() , because it is the most easy-to-use method and has the highest level of abstraction. 最好的选择是第三个 - connection.getLastModified() ,因为它是最容易使用的方法,具有最高级别的抽象。 All the rest are on lower levels of abstraction: the first reads the raw response, and the second reads the raw header. 所有其余的都在较低的抽象级别上:第一个读取原始响应,第二个读取原始头。 The third reads the header and converts it to long. 第三个读取标题并将其转换为long。

The difference between the outputs is due to the timezone. 输出之间的差异是由于时区。 Using new Date() you use the VM default timezone. 使用new Date()可以使用VM默认时区。 Prefer Calendar, or best - joda-time DateTime which support custom time zones. 首选日历,或最佳 - joda-time DateTime ,支持自定义时区。

The first piece of code extracts the date from the metadata of the PDF file, while the two other ones get the information from the HTTP headers returned by the Web server. 第一段代码从PDF文件的元数据中提取日期,而另外两段代码从Web服务器返回的HTTP头中获取信息。 The first one is probably more accurate if you want to know when the document was created/modified. 如果您想知道文档的创建/修改时间,第一个可能更准确。

The last modified date should be in GMT (RFC 2822) so you should get get it like this: 最后修改日期应该是GMT(RFC 2822),所以你应该得到这样的:

HttpURLConnection connection = (HttpURLConnection) url.openConnection();
Long dateTime = connection.getLastModified();
connection.disconnect();
ZonedDateTime urlLastModified = ZonedDateTime.ofInstant(Instant.ofEpochMilli(dateTime), ZoneId.of("GMT"));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM