简体   繁体   English

在C#中散列SHA1大文件(超过2GB)

[英]Hash SHA1 large files (over 2gb) in C#

I`m looking for solution for hashing large file content (files may be over 2gb in 32bit os). 我正在寻找哈希大文件内容的解决方案(32位操作系统中的文件可能超过2GB)。 It there any easy solution for that? 那有什么简单的解决方案吗? Or just reading by part and loading to buffer? 或者只是按部分阅读并加载到缓冲区?

Driis的解决方案听起来更灵活,但HashAlgorithm.ComputeHash也会接受Stream s作为参数。

Use TransformBlock and TransformFinalBlock to calculate the hash block by block, so you won't need to read the entire file into memory. 使用TransformBlockTransformFinalBlock逐块计算哈希值,因此您无需将整个文件读入内存。 (There is a nice example in the first link - and another one in this previous question ). (在第一个链接中有一个很好的例子 - 在前一个问题中有另一个例子)。

If you choose to use TransformBlock , then you can safely ignore the last parameter and set the outputBuffer to null . 如果选择使用TransformBlock ,则可以安全地忽略最后一个参数并将outputBuffer设置为null TransformBlock will copy from the input to the output array - but why would you want to simply copy bits for no good reason? TransformBlock将从输入复制到输出数组 - 但为什么你只想简单地复制位?

Furthermore, all mscorlib HashAlgorithms work as you might expect, ie the block size doesn't seem to affect the hash output; 此外,所有mscorlib HashAlgorithms都可以正常工作,即块大小似乎不会影响哈希输出; and whether you pass the data in one array and then hash in chunks by changing the inputOffset or you hash by passing smaller, separate arrays doesn't matter. 以及是否在一个数组中传递数据然后通过更改inputOffset以块的inputOffset散列,或者通过传递较小的单独数组进行散列并不重要。 I verified this using the following code: 我使用以下代码验证了这一点:

(this is slightly long, just here so people can verify for themselves that HashAlgorithm implementations are sane). (这有点长,就在这里,人们可以自己验证HashAlgorithm实现是否合理)。

public static void Main() {
    RandomNumberGenerator rnd = RandomNumberGenerator.Create();
    byte[] input = new byte[20];
    rnd.GetBytes(input);
    Console.WriteLine("Input Data: " + BytesToStr(input));

    var hashAlgoTypes = Assembly.GetAssembly(typeof(HashAlgorithm)).GetTypes()
        .Where(t => typeof(HashAlgorithm).IsAssignableFrom(t) && !t.IsAbstract);

    foreach (var hashType in hashAlgoTypes) 
        new AlgoTester(hashType).AssertOkFor(input.ToArray());
}

public static string BytesToStr(byte[] bytes) {
    StringBuilder str = new StringBuilder();

    for (int i = 0; i < bytes.Length; i++)
        str.AppendFormat("{0:X2}", bytes[i]);

    return str.ToString();
}
public class AlgoTester {
    readonly byte[] key;
    readonly Type type;
    public AlgoTester(Type type) {
        this.type=type;
        if (typeof(KeyedHashAlgorithm).IsAssignableFrom(type))
            using(var algo = (KeyedHashAlgorithm)Activator.CreateInstance(type))
                key = algo.Key.ToArray();
    }
    public HashAlgorithm MakeAlgo() {
        HashAlgorithm algo = (HashAlgorithm)Activator.CreateInstance(type);
        if (key != null)
            ((KeyedHashAlgorithm)algo).Key = key;
        return algo;
    }

    public byte[] GetHash(byte[] input) {
        using(HashAlgorithm sha = MakeAlgo())
            return sha.ComputeHash(input);
    }

    public byte[] GetHashOneBlock(byte[] input) {
        using(HashAlgorithm sha = MakeAlgo()) {
            sha.TransformFinalBlock(input, 0, input.Length);
            return sha.Hash;
        }
    }

    public byte[] GetHashMultiBlock(byte[] input, int size) {
        using(HashAlgorithm sha = MakeAlgo()) {
            int offset = 0;
            while (input.Length - offset >= size)
                offset += sha.TransformBlock(input, offset, size, input, offset);
            sha.TransformFinalBlock(input, offset, input.Length - offset);
            return sha.Hash;
        }
    }

    public byte[] GetHashMultiBlockInChunks(byte[] input, int size) {
        using(HashAlgorithm sha = MakeAlgo()) {
            int offset = 0;
            while (input.Length - offset >= size)
                offset += sha.TransformBlock(input.Skip(offset).Take(size).ToArray()
                    , 0, size, null, -24124512);
            sha.TransformFinalBlock(input.Skip(offset).ToArray(), 0
                , input.Length - offset);
            return sha.Hash;
        }
    }

    public void AssertOkFor(byte[] data) {
        var direct = GetHash(data);
        var indirect = GetHashOneBlock(data);
        var outcomes =
            new[] { 1, 2, 3, 5, 10, 11, 19, 20, 21 }.SelectMany(i =>
                new[]{
                    new{ Hash=GetHashMultiBlock(data,i), Name="ByMSDN"+i},
                    new{ Hash=GetHashMultiBlockInChunks(data,i), Name="InChunks"+i}
                }).Concat(new[] { new { Hash = indirect, Name = "OneBlock" } })
            .Where(result => !result.Hash.SequenceEqual(direct)).ToArray();
        Console.Write("Testing: " + type);

        if (outcomes.Any()) {
            Console.WriteLine("not OK.");
            Console.WriteLine(type.Name + " direct was: " + BytesToStr(direct));
        } else Console.WriteLine(" OK.");

        foreach (var outcome in outcomes)
            Console.WriteLine(type.Name + " differs with: " + outcome.Name + " "
                + BytesToStr(outcome.Hash));
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM