[英]std::map with std::vector as key -- complexity of lookup function
I have a set of N
customers, indexed 0,...,N-1
.我有一组
N
个客户,索引0,...,N-1
。 Periodically, for some subset S
of customers, I need to evaluate a function f(S)
.定期,对于某些客户子集
S
,我需要评估 function f(S)
。 Computing f(S)
is of linear complexity in |S|
计算
f(S)
在|S|
中具有线性复杂度. . The set
S
of customers is represented as an object of type std::vector<int>
.客户集
S
表示为std::vector<int>
类型的 object 。 The subsets that come up for evaluation can be of different size each time.每次评估的子集可以具有不同的大小。 [Since the order of customers in
S
does not matter, the set can as well be represented as an object of type std::set<int>
or std::unordered_set<int>
.] [由于
S
中客户的顺序无关紧要,因此该集合也可以表示为std::set<int>
或std::unordered_set<int>
类型的 object 。]
In the underlying application, I may have the same subset S
of customers come up multiple times for evaluation of f(S)
.在底层应用程序中,我可能会多次出现相同的客户子集
S
来评估f(S)
。 Instead of incurring the needless linear complexity each time, I am looking to see if it would benefit from some sort of less computational burdensome lookup.我不是每次都会产生不必要的线性复杂性,而是希望看看它是否会从某种计算负担较少的查找中受益。
I am considering having a map of key-value pairs where the key is directly the vector of customers, std::vector<int> S
and the value mapped to this key is f(S)
.我正在考虑使用键值对的 map,其中键直接是客户的向量
std::vector<int> S
,映射到该键的值是f(S)
。 That way, I am hoping that I can first check to see if a key already exists in the map, and if it does, I can look it up without having to compute f(.)
again.这样,我希望我可以首先检查 map 中是否已经存在密钥,如果存在,我可以查找它而无需再次计算
f(.)
。
Having an std::map
with std::vector
as keys is well-defined.以
std::vector
作为键的std::map
是明确定义的。 See, for eg, here .例如,请参见此处。
CPPReference indicates that map lookup time is logarithmic. CPPReference表明 map 查找时间是对数的。 But I suppose this is logarithmic in the number of
key
s where each key if of a constant length -- such as an int
or a double
, etc. How is the complexity affected where the key itself need not be of constant length and can be of arbitrary length upto size N
?但是我想这是
key
s 的数量的对数,其中每个 key 如果具有恒定长度 - 例如int
或double
等。在密钥本身不需要具有恒定长度并且可以是的情况下,复杂性如何受到影响任意长度到大小N
?
Since the keys can themselves be of different sizes (subset of customers that come up for evaluation could be different each time), does this introduce any additional complexity in computing a hash function or the compare operation for the std::map
?由于密钥本身可能具有不同的大小(每次评估的客户子集可能不同),这是否会在计算 hash function 或
std::map
51424E5AEZB 的比较操作时引入任何额外的复杂性? Is there any benefit to maintain the key as a binary array a fixed length N
?将密钥作为二进制数组保持固定长度
N
有什么好处吗? This binary array is such that B_S[i]=1
if the i
th customer is in set S
, and it is 0 otherwise.如果第
i
个客户在集合S
中,则此二进制数组B_S[i]=1
,否则为 0。 Does this make the lookup any easier?这是否使查找更容易?
I am aware that ultimately the design choice between reevaluating f(S)
each time versus using std::map
would have to be done based on actual profiling of my application.我知道,最终每次重新评估
f(S)
与使用std::map
之间的设计选择必须根据我的应用程序的实际分析来完成。 However, before implementing both ideas (the std::map
route is more difficult to code in my underlying application), I would like to know if there are any known pre-existing best-practices / benchmarks.但是,在实现这两个想法之前(
std::map
路线在我的底层应用程序中更难编码),我想知道是否有任何已知的预先存在的最佳实践/基准。
Complexity of lookup in a map is O(log N)
That is, roughly log N
comparisons are needed when there are N
elements in the map. map 中的查找复杂度为
O(log N)
,也就是说,当 map 中有N
个元素时,需要进行大约log N
次比较。 The cost of the comparison itself adds to that linearly.比较本身的成本线性增加。 For example when you compare
M
vectors with K
elements, then there are roughly log N
comparisons, each comparing M*K
vector elements, ie in total O(M*K*log N)
.例如,当您将
M
个向量与K
个元素进行比较时,大约有log N
次比较,每次比较M*K
个向量元素,即总共O(M*K*log N)
。
However, asymptotic complexity is only that: Asymptotic complexity.但是,渐近复杂度只是:渐近复杂度。 When there are only a small number of elements in the map then lower order factors might outweigh the
log N
that only dominates for large N
.当 map 中只有少量元素时,低阶因素可能会超过仅在大
N
中占主导地位的log N
N 。 Consequently, the actual runtime depends on your specific application and you need to measure to be sure.因此,实际运行时间取决于您的特定应用程序,您需要进行测量以确定。
Moreover, you shouldn't use vectors as keys in the first place.此外,您首先不应该使用向量作为键。 Its a waste of memory.
它浪费了 memory。 Subsets of
S
can be enumerated with a n-bit integer when S
has n
elements (simply set the i-th bit when i-th element of S
is in the subset).当
S
有n
元素时,可以用 n 位 integer 枚举S
的子集(当S
的第 i 个元素在子集中时,只需设置第 i 位)。 Comparing a single integer (or bitset) is surely more efficient than comparing vectors of integers.比较单个 integer(或位集)肯定比比较整数向量更有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.