简体   繁体   English

如何为位于 Http Url 中的文件生成 MD5 hash?

[英]How to generate MD5 hash for a file located in a Http Url?

I am writing a web crawler to search for files and download.我正在编写一个 web 爬虫来搜索文件和下载。 My problem is I do not want to download the same files that are downloaded already to the local drive.我的问题是我不想下载已经下载到本地驱动器的相同文件。 I know it's possible to use the MD5 hash to compare but how can I do this on HTTP URL without downloading them to the local disk?我知道可以使用 MD5 hash 进行比较,但是如何在 HTTP URL 上执行此操作而不将它们下载到本地磁盘?

If this approach is wrong.如果这种方法是错误的。 Please advice on a better solution请建议更好的解决方案

Unless the webserver has some sort of service that shares the MD5, then No.除非网络服务器有某种共享 MD5 的服务,否则不会。

Computing a file hash requires every byte in the file.计算文件 hash 需要文件中的每个字节。 This is why changing a single byte changes the hash, to prevent getting modified files.这就是为什么更改单个字节会更改 hash,以防止获取修改的文件。

To generate a hash you're going to need the data (ie, you'll need to download it somehow).要生成 hash,您将需要数据(即,您需要以某种方式下载它)。

I would suggest that you investigate using the If-Modified-Since HTTP header instead (or maybe ETag / If-None-Match , if the particular server provides it).我建议您改为使用If-Modified-Since HTTP header 进行调查(或者可能是ETag / If-None-Match ,如果特定服务器提供它)。

The only comparison you will be able to perform on a remote file is a size comparison.您可以对远程文件执行的唯一比较是大小比较。 Unfortunately, this is probably not enough to determine that the contents are identical or not.不幸的是,这可能不足以确定内容是否相同。

Old question, but PowerShell 5+ can help to get MD5 of remote Url file by auto downloading it as a stream of bytes, then computing MD5 in one step:老问题,但是 PowerShell 5+ 可以帮助获取远程 Url 文件的 MD5,方法是自动将其下载为 stream 字节,然后一步计算 MD5:

$wc = [System.Net.WebClient]::new()
$pkgurl = 'http://www.remoteurl/file.zip'
$FileHash = Get-FileHash -Algorithm MD5 -InputStream ($wc.OpenRead($pkgurl)) 
write-host $FileHash.Hash 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM