简体   繁体   English

我可以确定给定字符串的内置哈希值始终相同吗?

[英]Can I be sure the built-in hash for a given string is always the same?

I am getting a string hash like this: 我得到这样的字符串哈希:

string content = "a very long string";
int contentHash = content.GetHashCode();

I am then storing the hash into a dictionary as key mapping to another ID. 然后我将哈希存储到字典中作为到另一个ID的键映射。 This is useful so I don't have to compare big strings during default dictionary hash computation but I can just fish the ID from the dictionary by key. 这很有用所以我不必在默认字典哈希计算期间比较大字符串 ,但我可以通过键从字典中获取ID。

Can I be sure that the hash for a given string ("a very long string") will be always the same? 我可以确定给定字符串的哈希值(“非常长的字符串”)将始终相同吗?

Can I be sure that two different strings won't have the same hash? 我可以确定两个不同的字符串不会具有相同的哈希值吗?

Also, if possible, how likely is it to get the same hash for different strings? 另外,如果可能的话,为不同的字符串获取相同的哈希的可能性有多大?

Yes, it will be consistent since strings are immutable. 是的,它将是一致的,因为字符串是不可变的。 However, I think you're misusing the dictionary. 但是,我认为你在滥用字典。 You should let the dictionary take the hash of the string for you by using the string as the key. 您应该让字典使用字符串作为键来获取字符串的哈希值。 Hashes are not guaranteed to be unique, so you may overwrite one key with another. 哈希不保证是唯一的,因此您可以用另一个密钥覆盖一个密钥。

Just to add some detail as to where the idea of a changing hashcode may have come from. 只是添加一些细节,以了解更改哈希码的想法可能来自何处。

As the other answers have rightly said the hashcode for a specific string will always be the same for a specific runtime version. 正如其他答案正确地说,特定字符串的哈希码对于特定的运行时版本将始终是相同的。 There is no guarantee that a newer runtime might use a different algorithm perhaps for performance reasons. 由于性能原因,无法保证较新的运行时可能会使用不同的算法。

The String class overrides the default GetHashCode implementation in object. String类重写object中的默认GetHashCode实现。

The default implementation for a reference type in .NET is to allocate a sequential ID (held internally by .NET) and assign it to the object (the objects heap storage has slot for storing this hashcode, it only assigned on the first call to GetHashCode for that object). .NET中引用类型的默认实现是分配一个顺序ID(由.NET内部保存)并将其分配给对象(对象堆存储具有用于存储此哈希码的槽,它仅在第一次调用GetHashCode时分配)对于那个对象)。

Hence creating an instance of a class, assigning it some values then retrieving the hashcode, followed by doing the exact same sequence with the same set of values will yeild different hashcodes. 因此,创建一个类的实例,为其分配一些值然后检索哈希码,然后使用相同的值集执行完全相同的序列将会产生不同的哈希码。 This may be the reason why some have been led to believe that hashcodes can change. 这可能是导致一些人认为哈希码可以改变的原因。 In fact though its the instance of a class which is allocated a hashcode once allocated that hashcode does not change for that instance. 实际上,虽然它是一个类的实例,它被分配了一个哈希码,一旦被分配,哈希码就不会为该实例而改变。

Edit : I've just noticed that none of the answers directly reference each of you questions (although I think the answer to them is clear) but just to tidy up:- 编辑 :我刚刚注意到没有一个答案直接引用你们每个人的问题(尽管我认为答案很明确)但只是为了整理: -

Can I be sure that the hash for a given string ("a very long string") will be always the same? 我可以确定给定字符串的哈希值(“非常长的字符串”)将始终相同吗?

In your usage, yes. 在您的使用中,是的。

Can I be sure that two different strings won't have the same hash? 我可以确定两个不同的字符串不会具有相同的哈希值吗?

No. Two different strings may have the same hash. 不。两个不同的字符串可能具有相同的哈希值

Also, if possible, how likely is it to get the same hash for different strings? 另外,如果可能的话,为不同的字符串获取相同的哈希的可能性有多大?

The probability is quite low, resulting hash is pretty random from a 4G domain. 概率非常低,因此从4G域中得到的散列非常随机。

Yes it will, that's the purpose of a hash code! 是的,那就是哈希码的目的! It's not guaranteed to be the same between different versions of the runtime tho. 在运行时的不同版本之间不能保证相同。 More info on MSDN 有关MSDN的更多信息

As others pointed out, the hash will remain constant over time. 正如其他人所指出的,散列将随着时间的推移保持不变。 But why are you hashing a string and then put it as key on a Dictionary? 但是为什么要对字符串进行哈希处理,然后将其作为字典上的键? Hashes are not guaranteed to be unique. 哈希不保证是独一无二的。 So you comparisons might be incorrect. 所以你比较可能是不正确的。 Let the Dictionary do it's job. 让字典做它的工作。 I think the most appropriate collection to this case is a HashSet . 我认为这个案例最合适的集合是HashSet

As many others have said, the implementation is dependent on the version of the framework but it also depends on the architecture . 正如许多其他人所说,实现依赖于框架的版本,但它也取决于架构 The implementation of string.GetHashCode() is dfferent in the x86 and x64 versions of the framework even if they have the same version number. string.GetHashCode()的实现在框架的x86和x64版本中是不同的,即使它们具有相同的版本号。

For example, if you are writing a client/server or .net remoting type of architecture and want to use a string HashCode to stop from downloading a large resource, you can only do this if both are the same version and bitness. 例如,如果您正在编写客户端/服务器或.net远程处理类型的体系结构,并且希望使用字符串HashCode停止下载大型资源,则只有两者具有相同的版本和位数时才能执行此操作。 Otherwise you should use a different hash -- MD5, SHA etc will work correctly. 否则你应该使用不同的哈希 - MD5,SHA等将正常工作。

The documentation for Object.GetHashCode states Object.GetHashCode的文档说明

If two objects compare as equal, the GetHashCode method for each object must return the same value. 如果两个对象比较相等,则每个对象的GetHashCode方法必须返回相同的值。

So you are guaranteed that the hash code will be the same for a given string. 因此,您可以保证给定字符串的哈希码是相同的。 However, you aren't guaranteed that it will be unique (there may be other strings that have the same hash code). 但是,您无法保证它是唯一的(可能有其他字符串具有相同的哈希码)。

You don't have to guess about run-times or versions, just use this CaseInsensitiveStringComparer class that I made in my spare time (you can pass it to the constructor of the dictionary or if you are using .NET 3.5, a HashSet): 您不必猜测运行时或版本,只需使用我在业余时间创建的CaseInsensitiveStringComparer类(您可​​以将其传递给字典的构造函数,或者如果您使用的是.NET 3.5,则为HashSet):

/// <summary>
/// StringComparer that is basically the same as StringComparer.OrdinalIgnoreCase, except that the hash code function is improved and guaranteed not to change.
/// </summary>
public class CaseInsensitiveStringComparer : StringComparer
{
    /// <summary>
    /// Compares two strings, ignoring case
    /// </summary>
    /// <param name="x">First string</param>
    /// <param name="y">Second string</param>
    /// <returns>Compare result</returns>
    public override int Compare(string x, string y)
    {
        return StringComparer.OrdinalIgnoreCase.Compare(x, y);
    }

    /// <summary>
    /// Checks if two strings are equal, ignoring case
    /// </summary>
    /// <param name="x">First string</param>
    /// <param name="y">Second string</param>
    /// <returns>True if strings are equal, false if not</returns>
    public override bool Equals(string x, string y)
    {
        return Compare(x, y) == 0;
    }

    /// <summary>
    /// Gets a hash code for a string, ignoring case
    /// </summary>
    /// <param name="obj">String to get hash code for</param>
    /// <returns>Hash code</returns>
    public override int GetHashCode(string obj)
    {
        if (obj == null)
        {
            return 0;
        }
        int hashCode = 5381;
        char c;
        for (int i = 0; i < obj.Length; i++)
        {
            c = obj[i];
            if (char.IsLower(c))
            {
                c = char.ToUpperInvariant(c);
            }
            hashCode = ((hashCode << 5) + hashCode) + c;
        }
        return hashCode;
    }
}

字符串根据其内容进行哈希处理,因此,如果使用默认的GetHashCode,则哈希应该保持不变。

As has already been mentioned you can be sure that a hash for a partiular string will be the same as they are hashed based on content. 正如已经提到的,您可以确定参与其中的一个哈希值与它们基于内容进行哈希处理的哈希相同。 However you cannot be sure that a particular string will be hashed the same for later versions of the .NET framework as is mentioned here 但是,您无法确定特定字符串是否会在此处提到的.NET框架的更高版本中进行相同的散列

So I would say that this method is fine if it is being used internally to an application. 所以我想说这个方法在内部用于应用程序时很好。 If you are persisting the value to a data store then it is probably best to roll your own function to ensure that it remains consistent across versions. 如果要将值保存到数据存储,那么最好自行编写函数以确保它在不同版本之间保持一致。

Can I be sure that the hash for a given string ("a very long string") will be always the same? 我可以确定给定字符串的哈希值(“非常长的字符串”)将始终相同吗?

Yes

Can I be sure that two different strings won't have the same hash? 我可以确定两个不同的字符串不会具有相同的哈希值吗?

No 没有

Given that there are an infinite number of different strings its just not possible to allocate a different int (32bits which can represent up to 4 billion) number for each. 鉴于存在无限数量的不同字符串,它们不可能为每个字符串分配不同的int(32位,可以表示多达40亿)。

With just 8 characters tehre are 2^60 different strings. 只有8个字符,tehre是2 ^ 60个不同的字符串。 This is infinitely larger than 2^32. 这无限大于2 ^ 32。 Naturally the hashcode of some of these strings must clash. 当然,其中一些字符串的哈希码必须发生冲突。

Two objects with the same hashcode do not have to be equal. 具有相同哈希码的两个对象不必相等。 To know for sure use the equals method. 要确定使用equals方法。 This is basically the strategy used by a hashmap to determine if keys are equal. 这基本上是hashmap用来确定密钥是否相等的策略。

Map.get(String key) Map.get(String key)

  • Calculate hashcode of key 计算密钥的哈希码
  • Use modulo to figure out which bucket key belongs too. 使用modulo来确定哪个桶密钥也属于。
  • Loop thru all the entries from that bucket attempting to find a matching key. 循环通过该桶中的所有条目尝试查找匹配的密钥。
  • When a key match is found return that entries' value. 找到密钥匹配时返回条目的值。

As a side note as maps gain more and more elements it will recreate more buckets and place all the old entries into the new buckets. 作为旁注,随着地图获得越来越多的元素,它将重新创建更多存储桶并将所有旧条目放入新存储桶中。 This helps present the bucket entry list from growing into really really long lists. 这有助于将存储桶条目列表扩展为非常长的列表。 A map wants many buckets with short lists. 地图需要许多带有短列表的存储桶。

The javadoc for Object.hashcode makes for interesting reading - ive pasted a snippet below. Object.hashcode的javadoc有趣的阅读 - 我在下面粘贴了一个片段。

 The equals method implements an equivalence relation:

* It is reflexive: for any reference value x, x.equals(x) should return true.
* It is symmetric: for any reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
* It is transitive: for any reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
* It is consistent: for any reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
* For any non-null reference value x, x.equals(null) should return false. 

The equals method for class Object implements the most discriminating possible equivalence relation on objects; 类Object的equals方法实现了对象上最具辨别力的等价关系; that is, for any reference values x and y, this method returns true if and only if x and y refer to the same object (x==y has the value true). 也就是说,对于任何参考值x和y,当且仅当x和y引用同一个对象时,此方法才返回true(x == y的值为true)。

This is a great example for the evils of premature optimization. 这是过早优化的弊端的一个很好的例子。

Do you have the output of a profiler or benchmark that tells you String comparison between entries in the same hash bucket is actually causing a performance problem? 您是否有分析器或基准测试的输出告诉您同一个散列桶中的条目之间的字符串比较实际上是否会导致性能问题?

Didn't think so. 不这么认为。 Just use the string itself as a key in the Dictionary. 只需将字符串本身用作Dictionary中的键。 That's how you're supposed to use it. 这就是你应该如何使用它。

BTW, there are far, far more different strings than there are different int, so basic logic tells you that it's impossible to have a different hashcode for each different string. 顺便说一句,有不同的字符串远远不同的字符串,所以基本的逻辑告诉你,每个不同的字符串都不可能有不同的哈希码。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为自定义UITableCell提供与内置单元格相同的布局指标的UITableView? - How can I supply a custom `UITableCell` to an `UITableView` with the same layout metrics that a built-in cell would have? C# 中的 SHA1 Hash 是否会始终为给定字符串返回相同的值? - Will a SHA1 Hash in C# always and forever return the same value for a given string? 如何在一个用户定义的函数中创建自己的 String.Split() 和 Array.Reverse() 内置函数来反转给定的字符串? - How do I make my own String.Split() and Array.Reverse() built-in functions in one user-defined function to reverse a given string? 如何在WPF中获取内置路由事件列表 - How can I get a list of built-in routed events in WPF 如何将自定义TextBox样式应用于内置控件? - How can I apply a custom TextBox style to built-in controls? 如何在Linux上使用内置的Kinect驱动程序? - How can I use the built-in Kinect driver on Linux? 我试图在不使用内置函数的情况下将一个字符串插入另一个字符串 - I'm trying to insert a string into another string without using built-in functions 是否有用于水平字符串连接的内置函数? - Is there a built-in function for horizontal string concatenation? 是否有内置函数可以在 .NET 中重复字符串或字符? - Is there a built-in function to repeat a string or char in .NET? 用于从字符串转换为字节的内置函数 - A Built-in Function to Convert from String to Byte
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM