简体   繁体   中英

Hash computation vs bucket walkthrough

I have a nested r-tree like datastructure in Python (list of lists). The key is a large number (about 10 digits). On each level there are about x number of items (eg:10) in the list. Then within each list, it recurses and has x items and so on. The height of the tree is h levels (eg: 5). Each level also has an indication of what range of keys it contains (like r-tree).

For a given key, I need to locate the corresponding entry in the tree. This can be trivially done by scanning through each level, check if the given key lies within the range. If so, then step into that layer and recurse till it reaches the leaf.

This can also be done by successively dividing the key by x and taking the quotient as list index.

So the question is, what is more effecient : walking through list sequentially (complexity = depth * x (eg:50)) or successively dividing the large number by x to get the actual list indices (complexity = h divisions (eg: 5 divisions)).

(ie) 50 range checks or 5 divisions ?

This needs to be scalable. So if this code is being accessed in cloud by very large number of users, what is efficient ? May be division is more expensive to perform at scale than range checks ?

You need to benchmark the code in somewhat realistic scenario.

The reason why it's so hard to say is that you are not just comparing division (by the way, modern compilers avoid divisions with a large number of tricks). On modern CPUs you have large caches so likely the list will fit into L2 or L3 which decreases the run-time dramatically. There's also the fancy vector/SIMD instructions that might be used to speed up all the checks in the linear case.

I would guess that going through the list sequentially will be faster, in addition the code will be simpler.

But don't take my word for it, take a real example and benchmark the two versions and pick based on the results. Especially if this is critical for your system's performance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM