简体   繁体   中英

Role of load factor when calculating space consumption of a hash table entry

I'm reading the article "Rationale for Adding Hash Tables to the C++ Standard Template Library" , and I don't understand this seemingly simple statement:

With hash tables, the amount of extra memory required depends on the organization of the table and on the load factor (whose denition also depends on the organization). The simplest case is the organization called open addressing, in which all entries are stored in a single random-access table. [...] In this case the amount of memory used per entry is M/α.

*M is the number of bytes required for the key and associated value, α is the load factor.

Why is it M/α? Why isn't it simply M+(amount of memory for each bucket * total buckets)?

In open addressing, you have a fixed-sized array of slots into which the elements are distributed. This is just a plain array with space for elements and (optionally) some control bits thrown in to mark which slots are full and which are empty.

Let's say that we have a table with s slots and that we want to distribute n elements into the table. This means that α = n / s, the number of elements divided by the number of slots. The space usage of the entire table is then sM, because there are s slots and each slot uses M bytes. Therefore, if we want to compute the memory used per element, we want to compute sM / n = M / (n / s) = M / α, which is where the formula comes from. Intuitively, this makes sense. If you have a single element in the table, the load factor is 1 / s and the total memory (Ms) divided by the number of elements (1) is therefore Ms. On the other hand, if the table is fully-loaded (n = s), then α = 1 and the total memory (Ms) divided by the number of elements (s) is equal to M.

You're on the right track in your calculation by looking at the amount of memory per bucket and multiplying that by the number of buckets. If you treat M as the size per element and s as the number of slots, you end up with a total space usage of Ms. (There's no need to add the M term in, and doing so actually gives you the wrong units: M has units "bytes per element" and Ms has units "bytes," so they shouldn't be added together).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM