简体   繁体   English

HttpWebResponse LastModified

[英]HttpWebResponse LastModified

Is the HttpWebResponse.LastModified accurate? HttpWebResponse.LastModified准确吗? Is it always present? 它总是存在吗? My project is to create a sort of a focused web crawler and I am stucked if I will use the hash value of a resource or just the HttpWebResponse.LastModified property to check the resource's "freshness". 我的项目是创建一种专注的Web爬虫,如果我将使用资源的哈希值或仅使用HttpWebResponse.LastModified属性来检查资源的“新鲜度”,我就会陷入困境。

Using the hash value means streaming the resource every time it's checked. 使用哈希值意味着每次检查时都会流式传输资源。 This has a big impact on overall performance. 这对整体表现有很大影响。

If I will just check the HttpWebResponse.LastModified, is it accurate? 如果我只是检查HttpWebResponse.LastModified,它是否准确?

HttpWebResponse.LastModified returns the value of the HTTP Last-Modified response header. HttpWebResponse.LastModified返回HTTP Last-Modified响应头的值。

HTTP response headers are set by the HTTP server sending the response. HTTP响应标头由发送响应的HTTP服务器设置。 It's completely up to the server if it sets the Last-Modified response header, and whether it sets it to an accurate value or not. 如果它设置了Last-Modified响应头,它是否完全取决于服务器,以及它是否将其设置为准确值。

The Last-Modified response header is part of the Validation Model for Caching in HTTP. Last-Modified响应头是HTTP中缓存验证模型的一部分。 It is usually used in conjunction with the If-Modified-Since request header. 它通常与If-Modified-Since请求标头一起使用。 You might want to read HTTP/1.1, part 6: Caching for the details. 您可能希望阅读HTTP / 1.1,第6部分:缓存细节。

It depends on your purpose. 这取决于你的目的。

Last-Modified will mean that the server is happy for you to keep using an entity that had the same last-modified value (or later by implication, though it would be strange for the server's last-modified to ever go back, but could happen in some multi-server situations). Last-Modified意味着服务器很高兴您继续使用具有相同最后修改值的实体(或稍后暗示,尽管服务器的最后修改后的内容会很奇怪,但可能会发生在一些多服务器情况下)。

E-tag is stronger (all the more if it's not a "weak" e-tag) in that it identifies the specific entity (e-tags for different language versions, different content-type versions, or different content-encoding versions will differ unless they are actually the same entity [which can happen, in a restricted set of circumstances]). 电子标签更强(如果它不是“弱”电子标签,则更多)因为它标识特定实体(不同语言版本的电子标签,不同的内容类型版本或不同的内容编码版本会有所不同除非它们实际上是同一个实体[ 可能会在有限的情况下发生])。

Both can be "loose" in terms of perhaps a server change is considered insignificant; 两者都可能“松散”,或许服务器变更被认为是微不足道的; the server is happy for you to keep using the previous entity because it considers it the same (except "strong" e-tags, which must indicate octet-to-octet identity for use with range requests). 服务器很高兴您继续使用前一个实体,因为它认为它是相同的(“强”电子标签除外,它必须指示用于范围请求的八位字节到八位字节的标识)。

Both can of course just be plain wrong. 两者当然都可能是完全错误的。 Bugs happen. 虫子发生了。 That said, when they are wrong its more often in the other direction, reporting a change when none has happened (a valid behaviour, one is allowed to be over-cautious about freshness; it never damages only makes sub-optimal). 也就是说,当他们错了时,往往会朝另一个方向发生变化,在没有发生变化的情况下报告变化(一个有效的行为,一个人可以对新鲜度过于谨慎;它永远不会损害只会使次优)。

The question then, is whether you need to know that the server considers no change to have been made (most usage) or there really has been a change (pretty much restricted to diagnostic tools). 那么问题是,您是否需要知道服务器认为没有进行任何更改(大多数使用)或者确实存在更改(几乎仅限于诊断工具)。

Unless you've a clear reason not to, trust last-modified and e-tag (but trust e-tag more). 除非你有明确的理由不信任,否则请相信最后修改和电子标签(但更多信任电子标签)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM