简体   繁体   English

为什么“int”和“sbyte”GetHashCode函数会生成不同的值?

[英]Why do “int” and “sbyte” GetHashCode functions generate different values?

We have the following code: 我们有以下代码:

int i = 1;
Console.WriteLine(i.GetHashCode());  // outputs => 1

This make sense and the same happen whit all integral types in C# except sbyte and short. 除了sbyte和short之外,C#中的所有整数类型都是有意义的。 That is: 那是:

sbyte i = 1;
Console.WriteLine(i.GetHashCode());   //  outputs => 257

Why is this? 为什么是这样?

Because the source of that method ( SByte.GetHashCode ) is 因为该方法的来源( SByte.GetHashCode )是

public override int GetHashCode()
{
    return (int)this ^ ((int)this << 8);
}

As for why, well someone at Microsoft knows that.. 至于为什么,微软有人知道......

Yes it's all about values distribution. 是的,这都是关于价值分配的。 As the GetHashCode method return type is int for the type sbyte the values are going to be distributed in intervals of 257. For this same reason for the long type will be colisions. 由于GetHashCode方法的返回类型是类型为sbyte的int,因此值将以257的间隔分布。对于long类型,同样的原因将是colisions。

The reason is that it is probably done to avoid clustering of hash values. 原因是可能是为了避免哈希值的聚类。

As GetHashCode documentation says: 正如GetHashCode 文档所说:

For the best performance, a hash function must generate a random distribution for all input. 为获得最佳性能,哈希函数必须为所有输入生成随机分布。 Providing a good hash function on a class can significantly affect the performance of adding those objects to a hash table. 在类上提供良好的散列函数会显着影响将这些对象添加到散列表的性能。 In a hash table with a good implementation of a hash function, searching for an element takes constant time (for example, an O(1) operation). 在具有良好的散列函数实现的散列表中,搜索元素需要恒定的时间(例如,O(1)操作)。

Also, as this excellent article explains: 此外,正如这篇优秀文章所解释的:

Guideline: the distribution of hash codes must be "random" By a "random distribution" I mean that if there are commonalities in the objects being hashed, there should not be similar commonalities in the hash codes produced. 准则:哈希码的分布必须是“随机的”通过“随机分布”,我的意思是如果被哈希的对象中存在共性,则在所产生的哈希码中不应存在类似的共性。 Suppose for example you are hashing an object that represents the latitude and longitude of a point. 例如,假设您正在散列一个表示点的纬度和经度的对象。 A set of such locations is highly likely to be "clustered"; 一组这样的位置很可能被“聚集”; odds are good that your set of locations is, say, mostly houses in the same city, or mostly valves in the same oil field, or whatever. 例如,你的位置大多数位于同一个城市的房屋,或者大多数是同一油田的阀门,或者其他什么,这样的可能性很大。 If clustered data produces clustered hash values then that might decrease the number of buckets used and cause a performance problem when the bucket gets really big. 如果群集数据产生群集哈希值,则可能会减少使用的桶数,并在桶变得非常大时导致性能问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM