[英]Zlib Compressing and MD5 Hashing in C#
I have a large file (Source file (assuming 10GB)), that I need to read it by chunks, compress and hash it.我有一个大文件(源文件(假设 10GB)),我需要按块读取它,压缩和散列它。 (Finally, we have two outputs: the hash of the file in string format (md5HashHex) and the compressed file in byte format (destData).) Also before compression, I need to add a header to the destination (destData) and hash it.
(最后,我们有两个输出:字符串格式的文件(md5HashHex)和字节格式的压缩文件(destData)的哈希值。)同样在压缩之前,我需要向目标(destData)添加一个标头并对其进行哈希处理. After that, need to open the source file and read it chunk by chunk, compress and hash each chunk.
之后,需要打开源文件并逐块读取,对每个块进行压缩和散列。 I found out that my hashing would be different when I read the file chunk by chunk comparing to do the hash in one go.
我发现当我逐块读取文件时,我的散列会有所不同,与一次性进行散列相比。 Here is my code, I appreciate if you can help me with that.
这是我的代码,如果您能帮助我,我将不胜感激。 Also I would like to know if I am doing the compression correctly.
另外我想知道我是否正确地进行了压缩。 Thank you.
谢谢你。
public static void CompresingHashing(string inputFile)
{
MD5 md5 = MD5.Create();
int byteCount = 0;
var length = 8192;
var chunk = new byte[length];
byte[] destData;
byte[] compressedData;
byte[] header;
header = Encoding.ASCII.GetBytes("HEADER");
md5.TransformBlock(header, 0, header.Length, null, 0);
destData = AppendingArrays(destData, header); //destination
using (FileStream sourceFile = File.OpenRead(inputFile))
{
while ((byteCount = sourceFile.Read(chunk, 0, length)) > 0)
{
using (var ms = new MemoryStream())
{
using (ZlibStream result = new ZlibStream(ms, CompressionMode.Compress, CompressionLevel.Default)
result.Write(chunk, 0, chunk.Length);
}
compressedData = ms.ToArray();
md5.TransformBlock(compressedData, 0, compressedData.Length, null, 0);
destData = AppendingArrays(destData, compressedData);
}
md5.TransformFinalBlock(chunk, 0, 0);
byte[] md5Hash = md5.Hash;
string md5HashHex = string.Join(string.Empty, md5Hash.Select(b => b.ToString("x2")));
}
Console.WriteLine("Hash : " + hash);
}
public static byte[] AppendingArrays(byte[] existingArray, byte[] ArrayToAdd)
{
byte[] newArray = new byte[existingArray.Length + ArrayToAdd.Length];
existingArray.CopyTo(newArray, 0);
ArrayToAdd.CopyTo(newArray, existingArray.Length);
return newArray;
}
But If I hash destData (which is the source file + the header) I got the different result: (for the sake of space I didn't repeat the code )但是如果我散列 destData (这是源文件+标题),我得到了不同的结果:(为了空间我没有重复代码)
.
.
.
destData = AppendingArrays(destData, compressedData);
byte[] md5Hash = md5.ComputeHash(data);
.
.
.
Looks like you are processing the last chunk twice on the md5.看起来您正在 md5 上处理最后一个块两次。 Simply call
TransformFinalBlock
with a byte[0]
and length and offset of 0.只需使用
byte[0]
和长度和偏移量 0 调用TransformFinalBlock
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.