简体   繁体   English

如何在C#中使用Object.GetHashCode()比较两个巨大的byte []数组?

[英]How to compare two huge byte[] arrays using Object.GetHashCode() in c#?

I'm not quite understand why does Object.GetHashCode() return different values for two identical byte arrays, but returns equal values for not IEnumerable value type objects. 我不太清楚为什么Object.GetHashCode()为两个相同的字节数组返回不同的值,但是为非IEnumerable值类型对象返回相等的值。 For example: 例如:

byte e = 123;
Console.WriteLine(e.GetHashCode());

byte f = 123;
Console.WriteLine(f.GetHashCode());

output is 输出是

123
123

but when 但当

byte[] a = new byte[3] { 1, 2, 3 };
Console.WriteLine(a.GetHashCode());

byte[] b = new byte[3] { 1, 2, 3 };
Console.WriteLine(b.GetHashCode());

output is 输出是

46104728
12289376

Why is it so, and how can I quickly compare two huge arrays without comparing their every element? 为什么会这样,又如何不比较它们的每个元素而快速比较两个巨大的数组呢?

GetHashCode is not defined for array types - you have to implement your own hash algorithm. 没有为数组类型定义GetHashCode您必须实现自己的哈希算法。

The value you see is actually based on the underlying reference and so two identical arrays will always have different hash codes, unless they are the same reference. 您看到的值实际上基于基础引用,因此,两个相同的数组将始终具有不同的哈希码,除非它们是相同的引用。

For integral types 32-bits or less, the hash code is equal to the value as converted to a 32-bit integer. 对于32位或更少的整数类型,哈希码等于转换为32位整数的值。 For the 64 bit integral type, Int64 , the upper 32 bits are XORed with the lower 32 bits (there's a shift in there also) for the hash code. 对于64位整数类型Int64 ,哈希码的高32位与低32位(也有一个移位)进行异或。

So when it comes to trying to compare two arrays 'quickly', you have to do it yourself. 因此,当要“快速”比较两个数组时,您必须自己做。

You can use logic checks first - lengths are equal, start and end with the same byte value etc. Then you have a choice - either read byte - by - byte and compare values (or you can read 4 or 8 bytes at a time and use the BitConverter to convert blocks of bytes to Int32 or Int64 to make a single pair of values that might be quicker to check for equality) or use a general-purpose hash function to get a good guess of equality. 您可以先使用逻辑检查-长度相等,以相同的字节值开始和结束,等等。然后您可以选择-逐字节读取字节并比较值(或者您一次可以读取4或8个字节,使用BitConverter将字节块转换为Int32Int64以生成一对值,这些值可能会更快地检查相等性)或使用通用哈希函数来很好地猜测相等性。

For this purpose you can use an MD5 hash - it's very quick to output a hash with MD5: How do I generate a hashcode from a byte array in C#? 为此,您可以使用MD5哈希-使用MD5输出哈希非常快: 如何从C#中的字节数组生成哈希码? .

Getting two identical hash values from such a hash function does not guarantee equality, but in general if you are comparing arrays of bytes within the same data 'space' you shouldn't get a collision. 从这样的哈希函数获取两个相同的哈希值不能保证相等,但是通常,如果您要比较同一数据“空间”中的字节数组,则不会发生冲突。 By that I mean that, in general, examples of different data of the same type should nearly always produce different hashes. 我的意思是,通常,相同类型的不同数据的示例几乎应始终产生不同的哈希值。 There's a lot more around the net on this than I am qualified to explain. 关于这一点,网络上还有很多我无法解释的问题。

Try by use SHA1CryptoServiceProvider.ComputeHash method? 
It takes a byte array and returns a SHA1 hash which is identical 
for byte arrays. Performance is good.

string byte1hash; string byte2hash;
using (SHA1CryptoServiceProvider sha1 = new SHA1CryptoServiceProvider()) { byte1hash= Convert.ToBase64String(sha1.ComputeHash(byteArray1)); byte2hash= Convert.ToBase64String(sha1.ComputeHash(byteArray2));
} if (string.Equals(byte1hash, byte2hash)) { //States the byte arrays are same.. }

If you are not worried about security, then you go for MD5

For reference type by default GetHashCode is calculating hash code from reference and not from content of the object. 对于引用类型,默认情况下,GetHashCode从引用而不是从对象的内容计算哈希码。

I think you out of luck, to calculate hashcode of array you need to go over a content of the array at-least once 我认为您很不幸,要计算数组的哈希码,您至少需要遍历数组的内容一次

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM