简体   繁体   English

可以使用缓冲读取来计算MD5(或其他)哈希吗?

[英]Possible to calculate MD5 (or other) hash with buffered reads?

I need to calculate checksums of quite large files (gigabytes). 我需要计算很大文件(千兆字节)的校验和。 This can be accomplished using the following method: 可以使用以下方法完成此操作:

    private byte[] calcHash(string file)
    {
        System.Security.Cryptography.HashAlgorithm ha = System.Security.Cryptography.MD5.Create();
        FileStream fs = new FileStream(file, FileMode.Open, FileAccess.Read);
        byte[] hash = ha.ComputeHash(fs);
        fs.Close();
        return hash;
    }

However, the files are normally written just beforehand in a buffered manner (say writing 32mb's at a time). 但是,文件通常是事先以缓冲方式写入的(例如一次写入32mb)。 I am so convinced that I saw an override of a hash function that allowed me to calculate a MD5 (or other) hash at the same time as writing, ie: calculating the hash of one buffer, then feeding that resulting hash into the next iteration. 我如此确信,我看到了一个哈希函数的替代,该哈希函数使我能够在编写的同时计算MD5(或其他)哈希,即:计算一个缓冲区的哈希,然后将得到的哈希输入到下一个迭代中。

Something like this: (pseudocode-ish) 像这样的东西:(pseudocode-ish)

byte [] hash = new byte [] { 0,0,0,0,0,0,0,0 };
while(!eof)
{
   buffer = readFromSourceFile();
   writefile(buffer);
   hash = calchash(buffer, hash);
}

hash is now sililar to what would be accomplished by running the calcHash function on the entire file. 哈希现在与在整个文件上运行calcHash函数所实现的功能类似。

Now, I can't find any overrides like that in the.Net 3.5 Framework, am I dreaming ? 现在,在.Net 3.5 Framework中找不到类似的替代项,我在做梦吗? Has it never existed, or am I just lousy at searching ? 它从来没有存在过,还是我在搜索时很烂? The reason for doing both writing and checksum calculation at once is because it makes sense due to the large files. 一次进行写入和校验和计算的原因是由于文件大而有意义。

I like the answer above but for the sake of completeness, and being a more general solution, refer to the CryptoStream class. 我喜欢上面的答案,但是为了完整起见,并且是一个更通用的解决方案,请参阅CryptoStream类。 If you are already handling streams, it is easy to wrap your stream in a CryptoStream , passing a HashAlgorithm as the ICryptoTransform parameter. 如果您已经在处理流,则可以很容易地将流包装在CryptoStream ,并将HashAlgorithm作为ICryptoTransform参数传递。

var file = new FileStream("foo.txt", FileMode.Open, FileAccess.Write);
var md5 = MD5.Create();
var cs = new CryptoStream(file, md5, CryptoStreamMode.Write);
while (notDoneYet)
{
    buffer = Get32MB();
    cs.Write(buffer, 0, buffer.Length);
}
System.Console.WriteLine(BitConverter.ToString(md5.Hash));

You might have to close the stream before getting the hash (so the HashAlgorithm knows it's done). 您可能必须在获取哈希之前关闭流(因此HashAlgorithm知道已完成)。

You use the TransformBlock and TransformFinalBlock methods to process the data in chunks. 您可以使用TransformBlockTransformFinalBlock方法来分块处理数据。

// Init
MD5 md5 = MD5.Create();
int offset = 0;

// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);

// For last block:
md5.TransformFinalBlock(block, 0, block.Length);

// Get the has code
byte[] hash = md5.Hash;

Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock and then send an empty block to TransformFinalBlock to finalise the process. 注意:它可以(至少与MD5提供程序一起使用)将所有块发送到TransformBlock ,然后将空块发送到TransformFinalBlock以完成该过程。

似乎可以使用TransformBlock / TransformFinalBlock ,如本示例所示:对大型文件进行哈希处理时显示进度更新

Hash algorithms are expected to handle this situation and are typically implemented with 3 functions: 哈希算法有望处理这种情况,通常使用以下3个函数实现:

hash_init() - Called to allocate resources and begin the hash. hash_init() -调用以分配资源并开始哈希。
hash_update() - Called with new data as it arrives. hash_update() -在到达新数据时调用它。
hash_final() - Complete the calculation and free resources. hash_final() -完成计算并释放资源。

Look at http://www.openssl.org/docs/crypto/md5.html or http://www.openssl.org/docs/crypto/sha.html for good, standard examples in C; 请参阅http://www.openssl.org/docs/crypto/md5.htmlhttp://www.openssl.org/docs/crypto/sha.html ,以获取有关C语言的良好标准示例; I'm sure there are similar libraries for your platform. 我确定您的平台也有类似的库。

I've just had to do something similar, but wanted to read the file asynchronously. 我只需要做类似的事情,但想异步读取文件。 It's using TransformBlock and TransformFinalBlock and is giving me answers consistent with Azure, so I think it is correct! 它使用TransformBlock和TransformFinalBlock,并给我与Azure一致的答案,所以我认为这是正确的!

private static async Task<string> CalculateMD5Async(string fullFileName)
{
  var block = ArrayPool<byte>.Shared.Rent(8192);
  try
  {
     using (var md5 = MD5.Create())
     {
         using (var stream = new FileStream(fullFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8192, true))
         {
            int length;
            while ((length = await stream.ReadAsync(block, 0, block.Length).ConfigureAwait(false)) > 0)
            {
               md5.TransformBlock(block, 0, length, null, 0);
            }
            md5.TransformFinalBlock(block, 0, 0);
         }
         var hash = md5.Hash;
         return Convert.ToBase64String(hash);
      }
   }
   finally
   {
      ArrayPool<byte>.Shared.Return(block);
   }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM