How to generate MD5 hash for a file located in a Http Url?

Question

I am writing a web crawler to search for files and download. My problem is I do not want to download the same files that are downloaded already to the local drive. I know it's possible to use the MD5 hash to compare but how can I do this on HTTP URL without downloading them to the local disk?

If this approach is wrong. Please advice on a better solution

Answer 1

Unless the webserver has some sort of service that shares the MD5, then No.

Computing a file hash requires every byte in the file. This is why changing a single byte changes the hash, to prevent getting modified files.

Answer 2

To generate a hash you're going to need the data (ie, you'll need to download it somehow).

I would suggest that you investigate using the If-Modified-Since HTTP header instead (or maybe ETag / If-None-Match , if the particular server provides it).

Answer 3

The only comparison you will be able to perform on a remote file is a size comparison. Unfortunately, this is probably not enough to determine that the contents are identical or not.

Answer 4

Old question, but PowerShell 5+ can help to get MD5 of remote Url file by auto downloading it as a stream of bytes, then computing MD5 in one step:

$wc = [System.Net.WebClient]::new()
$pkgurl = 'http://www.remoteurl/file.zip'
$FileHash = Get-FileHash -Algorithm MD5 -InputStream ($wc.OpenRead($pkgurl)) 
write-host $FileHash.Hash

How to generate MD5 hash for a file located in a Http Url?

Question

4 answers

solution1
9 ACCPTED 2011-07-11 14:15:10

solution2
2 2011-07-11 14:16:59

solution3
0 2011-07-11 14:15:46

solution4
0 2021-11-01 00:40:49

How to generate MD5 hash for a file located in a Http Url?

Question

4 answers

solution1 9 ACCPTED 2011-07-11 14:15:10

solution2 2 2011-07-11 14:16:59

solution3 0 2011-07-11 14:15:46

solution4 0 2021-11-01 00:40:49

solution1
9 ACCPTED 2011-07-11 14:15:10

solution2
2 2011-07-11 14:16:59

solution3
0 2011-07-11 14:15:46

solution4
0 2021-11-01 00:40:49