I'm porting a C# script into Spark (Scala) and I'm running into an issue with UUID generation in Scala vs GUID generation in C#.
Is there any way to generate a UUID in Java that is identical to that of the one generated in C#?
I'm generating the primary key for a database by creating a Guid from the MD5 hash of a string. Ultimately, I'd like to generate UUIDs in Java/Scala that match those from the C# script, so the existing data in the database that used the C# implementation for hashing doesn't need to be rehashed.
C# to port:
String ex = "Hello World";
Console.WriteLine("String to Hash: {0}", ex);
byte[] md5 = GetMD5Hash(ex);
Console.WriteLine("Hash: {0}", BitConverter.ToString(md5));
Guid guid = new Guid(md5);
Console.WriteLine("Guid: {0}", guid);
private static byte[] GetMD5Hash(params object[] values) {
using (MD5 md5 = MD5.Create())
return md5.ComputeHash(Encoding.UTF8.GetBytes(s));
}
Scala ported code:
val to_encode = "Hello World"
val md5hash = MessageDigest.getInstance("MD5")
.digest(to_encode.trim().getBytes())
val md5string = md5hash.map("%02x-".format(_)).mkString
val uuid_bytes = UUID.nameUUIDFromBytes(to_encode.trim().getBytes())
printf("String to encode: %s\n", to_encode)
printf("MD5: %s\n", md5string)
printf("UUID: %s\n", uuid_bytes.toString)
Result from C#
Result from Scala
What works:
What doesn't:
Short of manipulating bytes, is there any other way to fix this?
If you want your C# and your Java to act exactly the same way (and you are happy with the existing C# behaviour), you'll need to manually re-order some of the bytes in uuid_bytes
(ie swap some of the entries you identified as out of order).
Additionally, you should not use:
UUID.nameUUIDFromBytes(to_encode.trim().getBytes())
But instead use:
public static String getGuidFromByteArray(byte[] bytes) {
ByteBuffer bb = ByteBuffer.wrap(bytes);
long high = bb.getLong();
long low = bb.getLong();
UUID uuid = new UUID(high, low);
return uuid.toString();
}
Shamelessly stolen from https://stackoverflow.com/a/24409153/34092 :)
In case you weren't aware, when dealing with C#'s GUIDs :
Note that the order of bytes in the returned byte array is different from the string representation of a Guid value. The order of the beginning four-byte group and the next two two-byte groups is reversed, whereas the order of the last two-byte group and the closing six-byte group is the same. The example provides an illustration.
And :
The order of hexadecimal strings returned by the ToString method depends on whether the computer architecture is little-endian or big-endian.
In your C#, rather than using:
Console.WriteLine("Guid: {0}", guid);
you may want to consider using:
Console.WriteLine(BitConverter.ToString(guid.ToByteArray()));
Your existing code calls ToString
behind the scenes. Alas, ToString
and ToByteArray
do not return the bytes in the same order .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.