简体   繁体   中英

Convert a string Id to unique Guid (or from md5 to Guid)?

I would like to create a system to convert an existing id (integer id or custom string id)

I would like to create a helper or extension method that generate a Guid from any int, long or string value. The idea is to update a database but keep some tracking from my old database. Each time I convert a string id like "O-2019-10-15" the system generate the same unique Guid. Let's focus on string here.

    public static Guid GenerateGuid(string input)
    {
        // Convertion
        byte[] _byteIds = Encoding.UTF8.GetBytes(input);

        //What about using MD5?
        MD5CryptoServiceProvider _md5 = new MD5CryptoServiceProvider();
        byte[] _checksum = _md5.ComputeHash(_byteIds);

        // Convert ?
        string part1 = /* ??? */;
        string part2 = /* ??? */;
        string part3 = /* ??? */;
        string part4 = /* ??? */;
        string part5 = /* ??? */;

        //Concat these 4 part into one string
        return Guid.Parse("{0}-{1}-{2}-{3}-{4}", part1, part2, part3, part4, part5);
    }

What do you think? Is md5 a correct start? Is there any rule in Guid() representation?

The idea of md5 is that I can convert everything to a 16 byte signature. From there I just need to convert it as Guid(). But I don't know the details about the Guid. Is there any rules already existing, reserved position for any data or other information?

I wouldn't do it like this.

I would use Guid.NewGuid() for new id and then keep the old id alongside it (or in a translation table).

Next time I need the new id I would look for the old id and see if I already have a guid for it.


If it is critical to keep one Id, which I don't recommend, I would have it as a $"{guid}+{oldid}" .

For the moment I did this

Guid GenerateGuid(string input)
{
    byte[] _byteIds = Encoding.UTF8.GetBytes(input);

    MD5CryptoServiceProvider _md5 = new MD5CryptoServiceProvider();
    byte[] _checksum = _md5.ComputeHash(_byteIds);

    //Convert checksum into 4 ulong parts and use BASE36 to encode both
    string part1 = BitConverter.ToString(_checksum, 0, 4).Replace("-", string.Empty);
    string part2 = BitConverter.ToString(_checksum, 4, 2).Replace("-", string.Empty);
    string part3 = BitConverter.ToString(_checksum, 6, 2).Replace("-", string.Empty);
    string part4 = BitConverter.ToString(_checksum, 8, 2).Replace("-", string.Empty);
    string part5 = BitConverter.ToString(_checksum, 10, 6).Replace("-", string.Empty);

    return Guid.Parse($"{part1}-{part2}-{part3}-{part4}-{part5}");
}

To avoid collision the input must be unique too in my environment. I will prefix it with a namespace.

Creating deterministic UUIDs based on an existing namespace is exactly what UUIDv3/v5 are intended for. However, first you will need a namespace UUID.

A convenient (and valid) way to accomplish this is hierarchical namespaces. First, use the standard DNS namespace UUID plus your domain name to generate your root namespace:

Guid nsDNS = new Guid("6ba7b810-9dad-11d1-80b4-00c04fd430c8");

Guid nsRoot = Guid.Create(nsDNS, "myapp.example.com", 5);

Then create a namespace UUID for your string:

Guid nsFoo = Guid.Create(nsRoot, "Foo", 5);

Now you're ready to use your new Foo namespace UUID with individual names:

Guid bar = Guid.Create(nsFoo, "Bar", 5);

The benefit of this is that anyone else will get completely different UUIDs than you, even if their strings (other than the domain, obviously) are identical to yours, preventing collisions if your data sets are ever merged, yet it's completely deterministic, logical and self-documenting.

(Note: I've never actually used C#, so if I got the syntax slightly wrong, feel free to edit. I think the pattern is clear regardless.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM