简体   繁体   中英

Converting a partial MD5 hash code into a long

I'm using the MD5 algorithm to hash the key for an on-disk hash table (I know it's questionable whether this is the best algorithm to use for this, but I'm going with it for now. The problem is generalizable to any algorithm that produces a byte array). My problem is this:

The size of the hash code determines the number of combinations (buckets) in the hash table. Since MD5 is 128 bit, there are a huge number of combinations (~ 3.4e38) which is way too big for my purpose. So what I want to do is pick off the first n bits of the byte array that MD5 produces, and convert those into a long (or ulong) value. Since MD5 produces a byte array, it would be easy to do if I wanted an integral number of bytes, but this leads to too big a jump in the number of combinations. I'm finding the single bit version to be a lot trickier.

Goal:

n = 10  // I.e. I want 2^10 combinations
long pos = someFcn(byte[] key, n)

where key is the value being hashed, and n is the number of bits of the MD5 result I want to use. Pos, then, will be an integer from 0 to 1023 (in the case of n = 10). If n = 11, the code will be from 0 to 2^11-1 = 2027, etc. Has to be somewhat fast/efficient.

Doesn't seem that hard but it's eluding me. Any help would be much appreciated. Thanks.

First, convert the first four bytes into an integer, with BitConverter.ToInt32 . It's getting four bytes no matter what, but this probably won't make it measurably slower, since you're working with 32-bit registers for the rest of the calculations anyway, and complex stuff like "if it's < 16 then do this with the first two bytes" will just make it more complicated

Then, given that integer, take the lowest N bits. If you really want a specific number of bits [a power of two number of buckets] not known at compile time, ~((-1)<<N) is a nice trick to get 2^N-1.

Or you could simply use ToUInt32 instead and modulo a prime number [it might be slightly better to convert to UInt64 instead, then you've got fully half the bits to start with, in this case]

要获取前10位,例如:

int result = ((int)key[0] << 2) | (((int)key[1] >> 6) & 0x03)

If you have an array like this,

unsigned char data[2000];

then you can just scrape off the first n bits into an integer like so:

typedef unsigned long long int MyInt;

MyInt scrape(size_t n, unsigned char * data)
{
    MyInt result = 0;
    size_t b;

    for (b = 0; b < n / 8; ++b)
    {
       result <<= 8;
       result += data[b];
    }

    const size_t remaining_bits = n % 8;
    result <<= remaining_bits;
    result += (data[b] >> (8 - remaining_bits));

    return result;
 }

I'm assuming that CHAR_BITS == 8 , feel free to generalize the code if you like. Also the size of the array times 8 must be at least n .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM