C# 中的 Zlib 压缩和 MD5 散列

Question

I have a large file (Source file (assuming 10GB)), that I need to read it by chunks, compress and hash it.我有一个大文件（源文件（假设 10GB）），我需要按块读取它，压缩和散列它。 (Finally, we have two outputs: the hash of the file in string format (md5HashHex) and the compressed file in byte format (destData).) Also before compression, I need to add a header to the destination (destData) and hash it. （最后，我们有两个输出：字符串格式的文件（md5HashHex）和字节格式的压缩文件（destData）的哈希值。）同样在压缩之前，我需要向目标（destData）添加一个标头并对其进行哈希处理. After that, need to open the source file and read it chunk by chunk, compress and hash each chunk.之后，需要打开源文件并逐块读取，对每个块进行压缩和散列。 I found out that my hashing would be different when I read the file chunk by chunk comparing to do the hash in one go.我发现当我逐块读取文件时，我的散列会有所不同，与一次性进行散列相比。 Here is my code, I appreciate if you can help me with that.这是我的代码，如果您能帮助我，我将不胜感激。 Also I would like to know if I am doing the compression correctly.另外我想知道我是否正确地进行了压缩。 Thank you.谢谢你。

    public static void CompresingHashing(string inputFile)
    {
        MD5 md5 = MD5.Create();
        int byteCount = 0;
        var length = 8192;
        var chunk = new byte[length];
        byte[] destData;
        byte[] compressedData;
        byte[] header;
        header = Encoding.ASCII.GetBytes("HEADER");
        md5.TransformBlock(header, 0, header.Length, null, 0);
        destData = AppendingArrays(destData, header); //destination

        using (FileStream sourceFile = File.OpenRead(inputFile))
        {
          while ((byteCount = sourceFile.Read(chunk, 0, length)) > 0)
            {
              using (var ms = new MemoryStream())
              {
                using (ZlibStream result = new ZlibStream(ms, CompressionMode.Compress, CompressionLevel.Default)
                    result.Write(chunk, 0, chunk.Length);
              }
              compressedData = ms.ToArray();
              md5.TransformBlock(compressedData, 0, compressedData.Length, null, 0);
              destData = AppendingArrays(destData, compressedData);
            }
          md5.TransformFinalBlock(chunk, 0, 0);
          byte[] md5Hash = md5.Hash;
          string md5HashHex = string.Join(string.Empty, md5Hash.Select(b => b.ToString("x2")));
        }
        Console.WriteLine("Hash : " + hash);
    }

    public static byte[] AppendingArrays(byte[] existingArray, byte[] ArrayToAdd)
    {
        byte[] newArray = new byte[existingArray.Length + ArrayToAdd.Length];
        existingArray.CopyTo(newArray, 0);
        ArrayToAdd.CopyTo(newArray, existingArray.Length);
        return newArray;
    }

But If I hash destData (which is the source file + the header) I got the different result: (for the sake of space I didn't repeat the code )但是如果我散列 destData （这是源文件+标题），我得到了不同的结果：（为了空间我没有重复代码）

.
.
.
destData = AppendingArrays(destData, compressedData);
byte[] md5Hash = md5.ComputeHash(data);
.
.
.

Answer 1

Looks like you are processing the last chunk twice on the md5.看起来您正在 md5 上处理最后一个块两次。 Simply call TransformFinalBlock with a byte[0] and length and offset of 0.只需使用byte[0]和长度和偏移量 0 调用TransformFinalBlock 。

C# 中的 Zlib 压缩和 MD5 散列

问题描述

1 个解决方案

解决方案1
0 2020-03-30 01:32:34

C# 中的 Zlib 压缩和 MD5 散列

问题描述

1 个解决方案

解决方案1 0 2020-03-30 01:32:34

解决方案1
0 2020-03-30 01:32:34