简体   繁体   English

将部分MD5哈希码转换为long

[英]Converting a partial MD5 hash code into a long

I'm using the MD5 algorithm to hash the key for an on-disk hash table (I know it's questionable whether this is the best algorithm to use for this, but I'm going with it for now. The problem is generalizable to any algorithm that produces a byte array). 我正在使用MD5算法对磁盘上的哈希表的密钥进行哈希处理(我知道这是否是用于此操作的最佳算法是有疑问的,但现在我要使用它。这个问题可以推广到任何产生字节数组的算法)。 My problem is this: 我的问题是这样的:

The size of the hash code determines the number of combinations (buckets) in the hash table. 哈希码的大小确定哈希表中组合(存储桶)的数量。 Since MD5 is 128 bit, there are a huge number of combinations (~ 3.4e38) which is way too big for my purpose. 由于MD5是128位,因此有很多组合(〜3.4e38)对于我的目的来说太大了。 So what I want to do is pick off the first n bits of the byte array that MD5 produces, and convert those into a long (or ulong) value. 因此,我想做的是摘除MD5产生的字节数组的前n位,并将其转换为长(或ulong)值。 Since MD5 produces a byte array, it would be easy to do if I wanted an integral number of bytes, but this leads to too big a jump in the number of combinations. 由于MD5会生成一个字节数组,因此,如果我想要整数个字节,这将很容易做到,但是这会导致组合数量的跳跃太大。 I'm finding the single bit version to be a lot trickier. 我发现单一版本要复杂得多。

Goal: 目标:

n = 10  // I.e. I want 2^10 combinations
long pos = someFcn(byte[] key, n)

where key is the value being hashed, and n is the number of bits of the MD5 result I want to use. 其中key是要散列的值,n是我要使用的MD5结果的位数。 Pos, then, will be an integer from 0 to 1023 (in the case of n = 10). 则pos将是0到1023之间的整数(在n = 10的情况下)。 If n = 11, the code will be from 0 to 2^11-1 = 2027, etc. Has to be somewhat fast/efficient. 如果n = 11,则代码将为0到2 ^ 11-1 = 2027,依此类推。必须有点快速/高效。

Doesn't seem that hard but it's eluding me. 似乎并不难,但它使我难以理解。 Any help would be much appreciated. 任何帮助将非常感激。 Thanks. 谢谢。

First, convert the first four bytes into an integer, with BitConverter.ToInt32 . 首先,使用BitConverter.ToInt32将前四个字节转换为整数。 It's getting four bytes no matter what, but this probably won't make it measurably slower, since you're working with 32-bit registers for the rest of the calculations anyway, and complex stuff like "if it's < 16 then do this with the first two bytes" will just make it more complicated 无论如何,它都会得到四个字节,但这可能不会使其变慢,因为无论如何,您都将使用32位寄存器来进行其余的计算,而复杂的操作如“如果小于16,则使用前两个字节”只会使其变得更加复杂

Then, given that integer, take the lowest N bits. 然后,给定该整数,取最低的N位。 If you really want a specific number of bits [a power of two number of buckets] not known at compile time, ~((-1)<<N) is a nice trick to get 2^N-1. 如果您确实想在编译时不知道特定数量的位[两个桶的幂], ~((-1)<<N)是获得2 ^ N-1的好技巧。

Or you could simply use ToUInt32 instead and modulo a prime number [it might be slightly better to convert to UInt64 instead, then you've got fully half the bits to start with, in this case] 或者,您也可以简单地使用ToUInt32并对素数取模[相反,转换为UInt64可能会更好一些,在这种情况下,您将有一半的位开始使用]

要获取前10位,例如:

int result = ((int)key[0] << 2) | (((int)key[1] >> 6) & 0x03)

If you have an array like this, 如果您有这样的数组,

unsigned char data[2000];

then you can just scrape off the first n bits into an integer like so: 那么您可以将前n位抓取为整数,如下所示:

typedef unsigned long long int MyInt;

MyInt scrape(size_t n, unsigned char * data)
{
    MyInt result = 0;
    size_t b;

    for (b = 0; b < n / 8; ++b)
    {
       result <<= 8;
       result += data[b];
    }

    const size_t remaining_bits = n % 8;
    result <<= remaining_bits;
    result += (data[b] >> (8 - remaining_bits));

    return result;
 }

I'm assuming that CHAR_BITS == 8 , feel free to generalize the code if you like. 我假设CHAR_BITS == 8 ,如果愿意,可以随意推广代码。 Also the size of the array times 8 must be at least n . 同样,数组乘以8的大小必须至少为n

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM