简体   繁体   中英

What is a propper way of calculating the size of a hash table

I am bullding a hash table using double hashing to solve collision. How can I know what is the propper size? I know it has to prime to minmize the number of collisions.

No, there is no point in making the size be prime and that adds a lot of extra work for you. Just make the size be a power of two and double it whenever the number of objects in the hash table reaches some threshold, like 50% or 25% of the size.

The easiest way to implement hash tables is to use a power-of-2 sized hash tables.

The reason is that if N = 2 M , then calculating H % N is as simple as calculating H & (N - 1) .

With fast hash functions such as MurmurHash3_32, the slowest part of using the hash table is actually calculating the modulo. H & (N - 1) does not calculate modulo but bitwise AND which is much faster (and it's the same as modulo if N is a power of 2).

Somebody could validly claim that MurmurHash suffers from seed-independent multicollisions and therefore is susceptible to a hash collision denial of service attack. That's true, but you shouldn't use linked lists to resolve hash collisions. You should use only hash tables where the keys are sortable by some comparison function (larger than, equal, smaller than) and then you can use red-black trees (or AVL trees) to resolve hash collisions. If there's no natural comparison functions (such as for complex numbers), you can invent one.

Using a red-black tree that almost always is just a single root element with MurmurHash is much faster than trying to be "secure" by using SipHash and then stupidly using linked lists to resolve hash collisions (which caused the need for the absurdly slow SipHash in the first place).

In theory, with non-power-of-2-sized hash tables where the size is rarely varying, you could use the "fast division by invariant integers using multiplication" trick but that's slower than power-of-2-sizing and bitwise AND.

The prime sizing is just for really poor hash functions. MurmurHash, although it suffers from seed-independent multicollisions, does not suffer from collisions with reasonable (non-attacker-generated) keys if the table size is a power of 2.

If you are asking about the current size, you may use the sizeof(table)/sizeof(element) function since you are using the double hashing method. If you are asking about the new size of the hash table once full (passed a certain criterion), then the most common is to add 10 new slots. This should be based on what are you using your table for. The default setting on most built in tables in other languages is if 0.75 full, then add 10 slots. If it's about something else, then please modify your question, so it's more descriptive. Edit: I just noticed the answer above me and I think that using the 2^p method is very common too in exponentially increasing tables and is very helpful with double hashing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM