简体繁体 English

std::map 以 std::vector 作为键——查找的复杂性 function

[英]std::map with std::vector as key -- complexity of lookup function

原文 2022-01-06 08:29:50 5 1 c++/ algorithm/ stdvector/ stdmap

I have a set of N customers, indexed 0,...,N-1 .我有一组N个客户，索引0,...,N-1 。 Periodically, for some subset S of customers, I need to evaluate a function f(S) .定期，对于某些客户子集S ，我需要评估 function f(S) 。 Computing f(S) is of linear complexity in |S|计算f(S)在|S|中具有线性复杂度. . The set S of customers is represented as an object of type std::vector<int> .客户集S表示为std::vector<int>类型的 object 。 The subsets that come up for evaluation can be of different size each time.每次评估的子集可以具有不同的大小。 [Since the order of customers in S does not matter, the set can as well be represented as an object of type std::set<int> or std::unordered_set<int> .] [由于S中客户的顺序无关紧要，因此该集合也可以表示为std::set<int>或std::unordered_set<int>类型的 object 。]

In the underlying application, I may have the same subset S of customers come up multiple times for evaluation of f(S) .在底层应用程序中，我可能会多次出现相同的客户子集S来评估f(S) 。 Instead of incurring the needless linear complexity each time, I am looking to see if it would benefit from some sort of less computational burdensome lookup.我不是每次都会产生不必要的线性复杂性，而是希望看看它是否会从某种计算负担较少的查找中受益。

I am considering having a map of key-value pairs where the key is directly the vector of customers, std::vector<int> S and the value mapped to this key is f(S) .我正在考虑使用键值对的 map，其中键直接是客户的向量std::vector<int> S ，映射到该键的值是f(S) 。 That way, I am hoping that I can first check to see if a key already exists in the map, and if it does, I can look it up without having to compute f(.) again.这样，我希望我可以首先检查 map 中是否已经存在密钥，如果存在，我可以查找它而无需再次计算f(.) 。

Having an std::map with std::vector as keys is well-defined.以std::vector作为键的std::map是明确定义的。 See, for eg, here .例如，请参见此处。

CPPReference indicates that map lookup time is logarithmic. CPPReference表明 map 查找时间是对数的。 But I suppose this is logarithmic in the number of key s where each key if of a constant length -- such as an int or a double , etc. How is the complexity affected where the key itself need not be of constant length and can be of arbitrary length upto size N ?但是我想这是key s 的数量的对数，其中每个 key 如果具有恒定长度 - 例如int或double等。在密钥本身不需要具有恒定长度并且可以是的情况下，复杂性如何受到影响任意长度到大小N ？

Since the keys can themselves be of different sizes (subset of customers that come up for evaluation could be different each time), does this introduce any additional complexity in computing a hash function or the compare operation for the std::map ?由于密钥本身可能具有不同的大小（每次评估的客户子集可能不同），这是否会在计算 hash function 或std::map 51424E5AEZB 的比较操作时引入任何额外的复杂性？ Is there any benefit to maintain the key as a binary array a fixed length N ?将密钥作为二进制数组保持固定长度N有什么好处吗？ This binary array is such that B_S[i]=1 if the i th customer is in set S , and it is 0 otherwise.如果第i个客户在集合S中，则此二进制数组B_S[i]=1 ，否则为 0。 Does this make the lookup any easier?这是否使查找更容易？

I am aware that ultimately the design choice between reevaluating f(S) each time versus using std::map would have to be done based on actual profiling of my application.我知道，最终每次重新评估f(S)与使用std::map之间的设计选择必须根据我的应用程序的实际分析来完成。 However, before implementing both ideas (the std::map route is more difficult to code in my underlying application), I would like to know if there are any known pre-existing best-practices / benchmarks.但是，在实现这两个想法之前（ std::map路线在我的底层应用程序中更难编码），我想知道是否有任何已知的预先存在的最佳实践/基准。

1 个解决方案

Complexity of lookup in a map is O(log N) That is, roughly log N comparisons are needed when there are N elements in the map. map 中的查找复杂度为O(log N) ，也就是说，当 map 中有N个元素时，需要进行大约log N次比较。 The cost of the comparison itself adds to that linearly.比较本身的成本线性增加。 For example when you compare M vectors with K elements, then there are roughly log N comparisons, each comparing M*K vector elements, ie in total O(M*K*log N) .例如，当您将M个向量与K个元素进行比较时，大约有log N次比较，每次比较M*K个向量元素，即总共O(M*K*log N) 。

However, asymptotic complexity is only that: Asymptotic complexity.但是，渐近复杂度只是：渐近复杂度。 When there are only a small number of elements in the map then lower order factors might outweigh the log N that only dominates for large N .当 map 中只有少量元素时，低阶因素可能会超过仅在大N中占主导地位的log N N 。 Consequently, the actual runtime depends on your specific application and you need to measure to be sure.因此，实际运行时间取决于您的特定应用程序，您需要进行测量以确定。

Moreover, you shouldn't use vectors as keys in the first place.此外，您首先不应该使用向量作为键。 Its a waste of memory.它浪费了 memory。 Subsets of S can be enumerated with a n-bit integer when S has n elements (simply set the i-th bit when i-th element of S is in the subset).当S有n元素时，可以用 n 位 integer 枚举S的子集（当S的第 i 个元素在子集中时，只需设置第 i 位）。 Comparing a single integer (or bitset) is surely more efficient than comparing vectors of integers.比较单个 integer（或位集）肯定比比较整数向量更有效。