简体   繁体   中英

Generate Unique Hash code based on String

I have the following two strings:

var string1 = "MHH2016-05-20MASTECH HOLDINGS, INC. Financialshttp://finance.yahoo.com/q/is?s=mhhEDGAR Online FinancialsHeadlines";

var string2 = "CVEO2016-06-22Civeo upgraded by Scotia Howard Weilhttp://finance.yahoo.com/q/ud?s=CVEOBriefing.comHeadlines";

At first glance these two strings are different however their hashcode is the same using the GetHashCode method .

var hash = 0;
var total = 0;
foreach (var x in string1) //string2
{
    //hash = x * 7;
    hash = x.GetHashCode();
    Console.WriteLine("Char: " +  x + " hash: " + hash + " hashed: " + (int) x);
    total += hash;
}

Total ends up being 620438779 for both strings. Is there another method that will return a more unique hash code? I need the hashcode to be unique based on the characters in the string. Although both strings are different and the code works properly, these two strings so happen add up to being the same. How can I improve this code to make them more unique?

string.GetHashCode is indeed inappropriate for real hashing:

Warning

A hash code is intended for efficient insertion and lookup in collections that are based on a hash table. A hash code is not a permanent value. For this reason:

  • Do not serialize hash code values or store them in databases.
  • Do not use the hash code as the key to retrieve an object from a keyed collection.
  • Do not use the hash code instead of a value returned by a cryptographic hashing function. For cryptographic hashes, use a class derived from the System.Security.Cryptography.HashAlgorithm or System.Security.Cryptography.KeyedHashAlgorithm class.
  • Do not test for equality of hash codes to determine whether two objects are equal. (Unequal objects can have identical hash codes.) To test for equality, call the ReferenceEquals or Equals method.

and has high possibility of duplicates .

Consider HashAlgorithm.ComputeHash . The sample is slightly changed to use SHA256 instead of MD5, as @zaph suggested:

static string GetSha256Hash(SHA256 shaHash, string input)
{
    // Convert the input string to a byte array and compute the hash.
    byte[] data = shaHash.ComputeHash(Encoding.UTF8.GetBytes(input));

    // Create a new Stringbuilder to collect the bytes
    // and create a string.
    StringBuilder sBuilder = new StringBuilder();

    // Loop through each byte of the hashed data 
    // and format each one as a hexadecimal string.
    for (int i = 0; i < data.Length; i++)
    {
        sBuilder.Append(data[i].ToString("x2"));
    }

    // Return the hexadecimal string.
    return sBuilder.ToString();
}
using System.Security.Cryptography;
string data="test";
byte[] hash;
using (MD5 md5 = MD5.Create())
{
    md5.Initialize();
    md5.ComputeHash(Encoding.UTF8.GetBytes(data));
    hash = md5.Hash;
}

hash is a 16 byte array, which in turn you could covert to some hex-string or base64 encoded string for storage.

EDIT:

What's the purpose of that hash code?

From hash(x) != hash(y) you can derive x!=y , but

from hash(x) == hash(y) you canNOT derive x==y in general!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM