简体   繁体   English

将 MD5 hash 表示为 integer

[英]Represent MD5 hash as an integer

In my user database table, I take the MD5 hash of the email address of a user as the id.在我的用户数据库表中,我取一个用户的email地址的MD5 hash作为id。

Example: email(example@example.org) = id(d41d8cd98f00b204e9800998ecf8427e)示例: email(example@example.org) = id(d41d8cd98f00b204e9800998ecf8427e)

Unfortunately, I have to represent the ids as integer values now - in order to be able to use an API where the id can only be an integer.不幸的是,我现在必须将 id 表示为 integer 值 - 为了能够使用 API,其中 id 只能是 integer。

Now I'm looking for a way to encode the id into an integer for sending an decode it again when receiving.现在我正在寻找一种将 id 编码为 integer 的方法,以便在接收时再次发送解码。 How could I do this?我怎么能这样做?

My ideas so far:到目前为止我的想法:

  1. convert_uuencode() and convert_uudecode() for the MD5 hash MD5 的convert_uuencode()convert_uudecode() hash
  2. replace every character of the MD5 hash by its ord() value用其ord()值替换 MD5 hash 的每个字符

Which approach is better?哪种方法更好? Do you know even better ways to do this?你知道更好的方法吗?

I hope you can help me.我希望你能帮助我。 Thank you very much in advance!非常感谢您!

Be careful.当心。 Converting the MD5s to an integer will require support for big (128-bit) integers.将 MD5 转换为整数将需要支持大(128 位)整数。 Chances are the API you're using will only support 32-bit integers - or worse, might be dealing with the number in floating-point.您使用的 API 可能只支持 32 位整数 - 或者更糟的是,可能正在处理浮点数。 Either way, your ID will get munged.无论哪种方式,您的 ID 都会被篡改。 If this is the case, just assigning a second ID arbitrarily is a much better way to deal with things than trying to convert the MD5 into an integer.如果是这种情况,与尝试将 MD5 转换为整数相比,仅任意分配第二个 ID 是一种更好的处理方式。

However, if you are sure that the API can deal with arbitrarily large integers without trouble, you can just convert the MD5 from hexadecimal to an integer.但是,如果您确定API 可以毫无困难地处理任意大的整数,则只需将 MD5 从十六进制转换为整数即可。 PHP most likely does not support this built-in however, as it will try to represent it as either a 32-bit integer or a floating point;然而,PHP 很可能不支持这个内置函数,因为它会尝试将其表示为 32 位整数或浮点数; you'll probably need to use the PHP GMP library for it.您可能需要为此使用PHP GMP 库

There are good reasons, stated by others, for doing it a different way.其他人说,有很好的理由以不同的方式来做。

But if what you want to do is convert an md5 hash into a string of decimal digits (which is what I think you really mean by "represent by an integer", since an md5 is already an integer in string form), and transform it back into the same md5 string:但是,如果您想要做的是将 md5 哈希转换为一串十进制数字(这就是我认为“用整数表示”的真正含义,因为 md5 已经是字符串形式的整数),然后将其转换回到相同的 md5 字符串:

function md5_hex_to_dec($hex_str)
{
    $arr = str_split($hex_str, 4);
    foreach ($arr as $grp) {
        $dec[] = str_pad(hexdec($grp), 5, '0', STR_PAD_LEFT);
    }
    return implode('', $dec);
}

function md5_dec_to_hex($dec_str)
{
    $arr = str_split($dec_str, 5);
    foreach ($arr as $grp) {
        $hex[] = str_pad(dechex($grp), 4, '0', STR_PAD_LEFT);
    }
    return implode('', $hex);
}

Demo:演示:

$md5 = md5('example@example.com');
echo $md5 . '<br />';  // 23463b99b62a72f26ed677cc556c44e8
$dec = md5_hex_to_dec($md5);
echo $dec . '<br />';  // 0903015257466342942628374306682186817640
$hex = md5_dec_to_hex($dec);
echo $hex;             // 23463b99b62a72f26ed677cc556c44e8

Of course, you'd have to be careful using either string, like making sure to use them only as string type to avoid losing leading zeros, ensuring the strings are the correct lengths, etc.当然,您必须小心使用任一字符串,例如确保仅将它们用作字符串类型以避免丢失前导零,确保字符串的长度正确等。

A simple solution could use intval() by specifying base 16 as the second argument.一个简单的解决方案可以通过将基数 16 指定为第二个参数来使用intval()

Systems that can accommodate 64-bit Ints can split the 128-bit/16-byte md5() hash into two 8-byte sections and then convert each.可以容纳 64 位 Int 的系统可以将 128 位/16 字节md5()哈希拆分为两个 8 字节部分,然后对每个部分进行转换。 Each hex pair represents 1 byte, so use 16 character chunks in this case:每个十六进制对代表 1 个字节,因此在这种情况下使用 16 个字符块:

$hash = md5($value);
$inthash1 = intval(substr($hash, 0, 16), 16);
$inthash2 = intval(substr($hash, 16, 16), 16);

For 32-bit Ints, the hex chunks are 8 characters:对于 32 位整数,十六进制块是 8 个字符:

foreach (str_split($hash, 8) as $chunk) {
    $int_hashes[] = intval($chunk, 16);
}

Why ord()?为什么是 ord()? md5 produce normal 16-byte value, presented to you in hex for better readability. md5 产生正常的 16 字节值,以十六进制呈现给您以提高可读性。 So you can't convert 16-byte value to 4 or 8 byte integer without loss.因此,您无法将 16 字节值转换为 4 或 8 字节整数而不会丢失。 You must change some part of your algoritms to use this as id.您必须更改算法的某些部分才能将其用作 ID。

您可以使用hexdec来解析十六进制字符串并将数字存储在数据库中。

难道您不能添加另一个作为自增 int 字段的字段吗?

what about:关于什么:

$float = hexdec(md5('string'));

or要么

$int = (integer) (substr(hexdec(md5('string')),0,9)*100000000);

Definitely bigger chances for collision but still good enaugh to use instead of hash in DB though?绝对有更大的碰撞机会,但在数据库中使用哈希代替哈希仍然足够好?

Use the email address as the file name of a blank, temporary file in a shared folder, like /var/myprocess/example@example.org使用电子邮件地址作为共享文件夹中空白临时文件的文件名,例如 /var/myprocess/example@example.org

Then, call ftok on the file name.然后,对文件名调用 ftok。 ftok will return a unique, integer ID. ftok 将返回一个唯一的整数 ID。

It won't be guaranteed to be unique though, but it will probably suffice for your API.虽然不能保证它是唯一的,但它可能足以满足您的 API。

Add these two columns to your table.将这两列添加到您的表中。

`email_md5_l` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(left(md5(`email`),16),16,10)) STORED,
`email_md5_r` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(right(md5(`email`),16),16,10)) STORED,

It might or might not help to create a PK on these two columns though, as it probably concatenates two string representations and hashes the result.不过,在这两列上创建 PK 可能有帮助也可能没有帮助,因为它可能连接两个字符串表示形式并对结果进行哈希处理。 It would kind of defeat your purpose and a full scan might be quicker but that depends on number of collumns and records.这会破坏您的目的,全面扫描可能会更快,但这取决于列和记录的数量。 Don't try to read these bigints in php as it doesn't have unsigned integers, just stay in sql and do something like:不要尝试读取 php 中的这些 bigints,因为它没有无符号整数,只需留在 sql 中并执行以下操作:

select email into result from `address`
    where url_md5_l=conv(left(md5(the_email),16),16,10)
      and url_md5_r=conv(right(md5(the_email),16),16,10) limit 1;

MD5 does collide btw.顺便说一句,MD5 确实会发生冲突。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM