简体   繁体   中英

Best approach to find a key in more than 1 (multiple) 'std::map's or 'std::set's?

Will take std::map example with minimal data.
I have 2 maps as below:

map<string, Object*> map_ShortKey; // keys are single English words
map<string, Object*> map_LongKey; // keys are concatenated English words

The map_ShortKey is populated at the beginning of the program with around 50 elements and remains constant throughout. But the map_LongKey continuously increases throughout the program and it may go upto 1000-10000 elements.

When I want to search a word inside these maps what is the best approach ?

(1) Search first in map_ShortKey , if not found then search in m_LongKey .
(2) Add map_ShortKey into m_LongKey and then search

Do you mean search a word, or search a key?

If map_LongKey contains concatenated words, then searching for the first word of a concatenation will be unsuccessful.

If you are searching for something that is actually a key in one of the maps however, then the answer to (2) depends on many things - more info needed.

If speed is your concern, then search first in whichever map is most likely to contain the key.

If speed is not your concern, then structure your code for clarity - whether this involves merging the maps together or otherwise will depend on your situation.

It depends on the likelyhood of a successful find in map_Shortkey - if it's quite likely, then you only spend 6 "steps" in this search [log2(n)], where a search in the map_LongKey list averages 10-13 "steps".

If, on the other hand, it's unlikely you will find the thing you are looking for in map_shortKey , then the additional load on searching among another 50 elements in the large set isn't going to make much of a difference.

Since we don't know the statistics of success, it's hard to say which is the better approach.

If you favor worst-case complexity and without knowing anything about your searches (eg the key is more likely to be found in one map than in the other), then I would go for approach 1).

Lookup in an std::map has logarithmic worst-case complexity, so in the first case you will end up with a worst-case complexity of log(n) + log(m) lookups (assuming your maps have n and m elements respectively). Thus, k lookups will cost you k * (log(n) + log(m)) .

Insertion in a map also has logarithmic complexity, so in the second case you will force m insertions from one map into the other and then a lookup in a map with m + n elements. Thus, for k lookups (provided you are doing the insertion only the first time), you get m * log(n) + k * log(n + m) worst-case complexity.

Thus, if you care about worst-case complexity, approach 1) is preferable as long as:

k * (log(n) + log(m)) < m * log(n) + k * log(n + m) 

You can estimate k based on your workload, n and m based on the size of the input, and do the math to figure out what is best (and then double-check this by measuring).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM