简体   繁体   中英

How can I calculate the impact on collision probability when truncating a hash?

I'd like to reduce an MD5 digest from 32 characters down to, ideally closer to 16. I'll be using this as a database key to retrieve a set of (public) user-defined parameters. I'm expecting the number of unique "IDs" to eventually exceed 10,000. Collisions are undesirable but not the end of the world.

I'd like to understand the viability of a naive truncation of the MD5 digest to achieve a shorter key. But I'm having trouble digging up a formula that I can understand (given I have a limited Math background), let alone use to determine the impact on collision probability that truncating the hash would have.

The shorter the better, within reason. I feel there must be a simple formula, but I'd rather have a definitive answer than do my own guesswork cobbled together from bits and pieces I have read around the web.

You can calculate the chance of collisions with this formula:

chance of collision = 1 - e^(-n^2 / (2 * d))

Where n is the number of messages, d is the number of possibilities, and e is the constant e (2.718281828...).

@mypetition's answer is great.

I found a few other equations that are more-or-less accurate and/or simplified here , along with a great explanation and a handy comparison of real-world probabilities:

...where k is the number of ID's you'll be generating (the "messages") and N is the largest number that can be produced by the hash digest or the largest number that your truncated hexadecimal number could represent (technically + 1, to account for 0).


A bit more about "N"

If your original hash is, for example, "38BF05A71DDFB28A504AFB083C29D037" (32 hex chars), and you truncate it down to, say, 12 hex chars (eg: "38BF05A71DDF"), the largest number you could produce in hexadecimal is "0xFFFFFFFFFFFF" (281474976710655 - which is 16^12-1 (or 256^6 if you prefer to think in terms of bytes). But since "0" itself counts as one of the numbers you could theoretically produce, you add back that 1, which leaves you simply with 16^12.

So you can think of N as 16 ^ (numberOfHexDigits).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM