简体   繁体   English

使用Java获取页面的上次修改日期

[英]Get a page's last modified date using Java

Is there a standard way to tell when a page was last modified? 有没有一种标准的方法来判断页面上次修改的时间? Currently I am doing this: 目前我这样做:

URLConnection uCon = url.openConnection();
uCon.setConnectTimeout(5000);   // 5 seconds
String lastMod = uCon.getHeaderField("Last-Modified");
System.out.println("last mod: "+lastMod);

However it looks like some sites do not have a Last-Modified field. 但是,有些网站看起来没有Last-Modified字段。

http://www.cbc.ca has these header fields: http://www.cbc.ca有以下标题字段:

X-Origin-Server
Connection
Expires
null
Date
Server
Content-Type
Transfer-Encoding
Cache-Control

I could parse a page to try and get its date but this seems like a major pain. 我可以解析一个页面来尝试获取它的日期,但这似乎是一个重大的痛苦。 What is the standard? 标准是什么?

(If possible I would like to stick with using URLConnection because that is what I use to download the webpage) (如果可能,我想坚持使用URLConnection,因为这是我用来下载网页的)

There is no standard. 没有标准。 Dynamically generated web pages generally do not have a Last-Modified field, and different web pages include dates in different ways. 动态生成的网页通常没有Last-Modified字段,不同的网页以不同的方式包含日期。 Some sites do not even include such a date, including "© <current year>" at the bottom. 有些网站甚至没有包含这样的日期,包括底部的“©<当前年份>”。 You could try looking for a date near the bottom or the top, but reliably extracting the date from the web page would have to be site-specific. 您可以尝试在底部或顶部附近查找日期,但可靠地从网页中提取日期必须是特定于站点的。

From HTTP/1.1: Header Field Definitions : HTTP / 1.1:标题字段定义

14.29 Last-Modified 14.29最后修改

The Last-Modified entity-header field indicates the date and time at which the origin server believes the variant was last modified. Last-Modified实体标题字段指示源服务器认为变体上次修改的日期和时间。

  Last-Modified = "Last-Modified" ":" HTTP-date 

An example of its use is 它的一个例子是

  Last-Modified: Tue, 15 Nov 1994 12:45:26 GMT 

The exact meaning of this header field depends on the implementation of the origin server and the nature of the original resource. 此标头字段的确切含义取决于源服务器的实现和原始资源的性质。 For files, it may be just the file system last-modified time. 对于文件,它可能只是文件系统上次修改时间。 For entities with dynamically included parts, it may be the most recent of the set of last-modify times for its component parts. 对于具有动态包含部分的实体,它可能是其组成部分的最后一组最后修改时间。 For database gateways, it may be the last-update time stamp of the record. 对于数据库网关,它可能是记录的最后更新时间戳。 For virtual objects, it may be the last time the internal state changed. 对于虚拟对象,它可能是内部状态最后一次更改。

An origin server MUST NOT send a Last-Modified date which is later than the server's time of message origination. 原始服务器不得发送比服务器发送消息的时间晚的Last-Modified日期。 In such cases, where the resource's last modification would indicate some time in the future, the server MUST replace that date with the message origination date. 在这种情况下,资源的最后修改将指示将来的某个时间,服务器必须用消息发起日期替换该日期。

An origin server SHOULD obtain the Last-Modified value of the entity as close as possible to the time that it generates the Date value of its response. 原始服务器应该尽可能接近生成其响应的Date值的时间获取实体的Last-Modified值。 This allows a recipient to make an accurate assessment of the entity's modification time, especially if the entity changes near the time that the response is generated. 这允许接收者准确评估实体的修改时间,特别是如果实体在生成响应的时间附近发生变化。

HTTP/1.1 servers SHOULD send Last-Modified whenever feasible. HTTP / 1.1服务器应该尽可能发送Last-Modified。

From this point Last-modified is optional and its value depends of the nature of the original resource. 从这一点开始, Last-modified是可选的,其值取决于原始资源的性质。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM