简体   繁体   English

std::map 以 std::vector 作为键——查找的复杂性 function

[英]std::map with std::vector as key -- complexity of lookup function

I have a set of N customers, indexed 0,...,N-1 .我有一组N个客户,索引0,...,N-1 Periodically, for some subset S of customers, I need to evaluate a function f(S) .定期,对于某些客户子集S ,我需要评估 function f(S) Computing f(S) is of linear complexity in |S|计算f(S)|S|中具有线性复杂度. . The set S of customers is represented as an object of type std::vector<int> .客户集S表示为std::vector<int>类型的 object 。 The subsets that come up for evaluation can be of different size each time.每次评估的子集可以具有不同的大小。 [Since the order of customers in S does not matter, the set can as well be represented as an object of type std::set<int> or std::unordered_set<int> .] [由于S中客户的顺序无关紧要,因此该集合也可以表示为std::set<int>std::unordered_set<int>类型的 object 。]

In the underlying application, I may have the same subset S of customers come up multiple times for evaluation of f(S) .在底层应用程序中,我可能会多次出现相同的客户子集S来评估f(S) Instead of incurring the needless linear complexity each time, I am looking to see if it would benefit from some sort of less computational burdensome lookup.我不是每次都会产生不必要的线性复杂性,而是希望看看它是否会从某种计算负担较少的查找中受益。

I am considering having a map of key-value pairs where the key is directly the vector of customers, std::vector<int> S and the value mapped to this key is f(S) .我正在考虑使用键值对的 map,其中键直接是客户的向量std::vector<int> S ,映射到该键的值是f(S) That way, I am hoping that I can first check to see if a key already exists in the map, and if it does, I can look it up without having to compute f(.) again.这样,我希望我可以首先检查 map 中是否已经存在密钥,如果存在,我可以查找它而无需再次计算f(.)

Having an std::map with std::vector as keys is well-defined.std::vector作为键的std::map是明确定义的。 See, for eg, here .例如,请参见此处

CPPReference indicates that map lookup time is logarithmic. CPPReference表明 map 查找时间是对数的。 But I suppose this is logarithmic in the number of key s where each key if of a constant length -- such as an int or a double , etc. How is the complexity affected where the key itself need not be of constant length and can be of arbitrary length upto size N ?但是我想这是key s 的数量的对数,其中每个 key 如果具有恒定长度 - 例如intdouble等。在密钥本身不需要具有恒定长度并且可以是的情况下,复杂性如何受到影响任意长度到大小N

Since the keys can themselves be of different sizes (subset of customers that come up for evaluation could be different each time), does this introduce any additional complexity in computing a hash function or the compare operation for the std::map ?由于密钥本身可能具有不同的大小(每次评估的客户子集可能不同),这是否会在计算 hash function 或std::map 51424E5AEZB 的比较操作时引入任何额外的复杂性? Is there any benefit to maintain the key as a binary array a fixed length N ?将密钥作为二进制数组保持固定长度N有什么好处吗? This binary array is such that B_S[i]=1 if the i th customer is in set S , and it is 0 otherwise.如果第i个客户在集合S中,则此二进制数组B_S[i]=1 ,否则为 0。 Does this make the lookup any easier?这是否使查找更容易?

I am aware that ultimately the design choice between reevaluating f(S) each time versus using std::map would have to be done based on actual profiling of my application.我知道,最终每次重新评估f(S)与使用std::map之间的设计选择必须根据我的应用程序的实际分析来完成。 However, before implementing both ideas (the std::map route is more difficult to code in my underlying application), I would like to know if there are any known pre-existing best-practices / benchmarks.但是,在实现这两个想法之前( std::map路线在我的底层应用程序中更难编码),我想知道是否有任何已知的预先存在的最佳实践/基准。

Complexity of lookup in a map is O(log N) That is, roughly log N comparisons are needed when there are N elements in the map. map 中的查找复杂度为O(log N) ,也就是说,当 map 中有N个元素时,需要进行大约log N次比较。 The cost of the comparison itself adds to that linearly.比较本身的成本线性增加。 For example when you compare M vectors with K elements, then there are roughly log N comparisons, each comparing M*K vector elements, ie in total O(M*K*log N) .例如,当您将M个向量与K个元素进行比较时,大约有log N次比较,每次比较M*K个向量元素,即总共O(M*K*log N)

However, asymptotic complexity is only that: Asymptotic complexity.但是,渐近复杂度只是:渐近复杂度。 When there are only a small number of elements in the map then lower order factors might outweigh the log N that only dominates for large N .当 map 中只有少量元素时,低阶因素可能会超过仅在大N中占主导地位的log N N 。 Consequently, the actual runtime depends on your specific application and you need to measure to be sure.因此,实际运行时间取决于您的特定应用程序,您需要进行测量以确定。

Moreover, you shouldn't use vectors as keys in the first place.此外,您首先不应该使用向量作为键。 Its a waste of memory.它浪费了 memory。 Subsets of S can be enumerated with a n-bit integer when S has n elements (simply set the i-th bit when i-th element of S is in the subset).Sn元素时,可以用 n 位 integer 枚举S的子集(当S的第 i 个元素在子集中时,只需设置第 i 位)。 Comparing a single integer (or bitset) is surely more efficient than comparing vectors of integers.比较单个 integer(或位集)肯定比比较整数向量更有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM