简体   繁体   English

C# 中的 Zlib 压缩和 MD5 散列

[英]Zlib Compressing and MD5 Hashing in C#

I have a large file (Source file (assuming 10GB)), that I need to read it by chunks, compress and hash it.我有一个大文件(源文件(假设 10GB)),我需要按块读取它,压缩和散列它。 (Finally, we have two outputs: the hash of the file in string format (md5HashHex) and the compressed file in byte format (destData).) Also before compression, I need to add a header to the destination (destData) and hash it. (最后,我们有两个输出:字符串格式的文件(md5HashHex)和字节格式的压缩文件(destData)的哈希值。)同样在压缩之前,我需要向目标(destData)添加一个标头并对其进行哈希处理. After that, need to open the source file and read it chunk by chunk, compress and hash each chunk.之后,需要打开源文件并逐块读取,对每个块进行压缩和散列。 I found out that my hashing would be different when I read the file chunk by chunk comparing to do the hash in one go.我发现当我逐块读取文件时,我的散列会有所不同,与一次性进行散列相比。 Here is my code, I appreciate if you can help me with that.这是我的代码,如果您能帮助我,我将不胜感激。 Also I would like to know if I am doing the compression correctly.另外我想知道我是否正确地进行了压缩。 Thank you.谢谢你。

    public static void CompresingHashing(string inputFile)
    {
        MD5 md5 = MD5.Create();
        int byteCount = 0;
        var length = 8192;
        var chunk = new byte[length];
        byte[] destData;
        byte[] compressedData;
        byte[] header;
        header = Encoding.ASCII.GetBytes("HEADER");
        md5.TransformBlock(header, 0, header.Length, null, 0);
        destData = AppendingArrays(destData, header); //destination

        using (FileStream sourceFile = File.OpenRead(inputFile))
        {
          while ((byteCount = sourceFile.Read(chunk, 0, length)) > 0)
            {
              using (var ms = new MemoryStream())
              {
                using (ZlibStream result = new ZlibStream(ms, CompressionMode.Compress, CompressionLevel.Default)
                    result.Write(chunk, 0, chunk.Length);
              }
              compressedData = ms.ToArray();
              md5.TransformBlock(compressedData, 0, compressedData.Length, null, 0);
              destData = AppendingArrays(destData, compressedData);
            }
          md5.TransformFinalBlock(chunk, 0, 0);
          byte[] md5Hash = md5.Hash;
          string md5HashHex = string.Join(string.Empty, md5Hash.Select(b => b.ToString("x2")));
        }
        Console.WriteLine("Hash : " + hash);
    }

    public static byte[] AppendingArrays(byte[] existingArray, byte[] ArrayToAdd)
    {
        byte[] newArray = new byte[existingArray.Length + ArrayToAdd.Length];
        existingArray.CopyTo(newArray, 0);
        ArrayToAdd.CopyTo(newArray, existingArray.Length);
        return newArray;
    }

But If I hash destData (which is the source file + the header) I got the different result: (for the sake of space I didn't repeat the code )但是如果我散列 destData (这是源文件+标题),我得到了不同的结果:(为了空间我没有重复代码)

.
.
.
destData = AppendingArrays(destData, compressedData);
byte[] md5Hash = md5.ComputeHash(data);
.
.
.

Looks like you are processing the last chunk twice on the md5.看起来您正在 md5 上处理最后一个块两次。 Simply call TransformFinalBlock with a byte[0] and length and offset of 0.只需使用byte[0]和长度和偏移量 0 调用TransformFinalBlock

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM