简体繁体 English

STL :: Map - 浏览列表或使用find？

[英]STL::Map - Walk through list or use find?

原文 2008-11-13 23:38:02 6 9 c++/ performance/ stl/ iterator/ find

Say I have a method that needs to pull 8 values from a map with 100 elements in it. 假设我有一个方法需要从地图中提取8个值，其中包含100个元素。 Which do you think would be preferable: 你认为哪个更好：

Walk in a for loop from begin to end once, pulling the elements out by switching on the key? 从开始到结束一次进入for循环，通过打开键拉出元素？

Or using find 8 times to get those values? 或者使用find 8次来获取这些值？

9 个解决方案

Walking the list will take you O(n) time to find a random element. 走在列表中需要花费O（n）时间来查找随机元素。

Map is a balanced binary tree, so doing a find is O(log n). Map是一个平衡的二叉树，因此查找是O（log n）。

Thus doing 8 finds results in 8*log2(n) and walking the list is (n). 因此，8执行8 * log2（n）的结果，并且行列表是（n）。 The larger the list, the larger the gains, but in all random cases doing finds will be faster than doing iterations. 列表越大，增益越大，但在所有随机情况下，执行查找将比执行迭代更快。

In non-random cases if there is reason to thing the items you want are near each other in the tree, or near the "begining" (left side) then walking/iterating would be faster. 在非随机情况下，如果有理由将所需的项目在树中或在“开始”（左侧）附近彼此靠近，那么步行/迭代将更快。 But that seems unlikey. 但这似乎是不相干的。

While I'd go with the find option, people put too much stress on asymptotic performance. 虽然我选择了find选项，但人们对渐近性能的压力过大。

The fact is that asymptotic performance is a handy guide for algorithms that can receive reasonably large inputs, but even then it isn't foolproof. 事实上，渐近性能是可以接收相当大的输入的算法的便利指南，但即便如此，它也不是万无一失的。 It's quite possible for an algorithm with worse asymptotic performance than another to be faster for any reasonable input. 对于任何合理的输入，具有比另一个更差的渐近性能的算法很可能更快。

And then there are times when your input size is known to be fairly small (or even fixed). 然后有时候你的输入大小相当小（甚至是固定的）。 In such cases asymptotic performance is almost meaningless. 在这种情况下，渐近性能几乎毫无意义。

I would use find 8 times. 我会用find 8次。 It will be less (and more obvious) code. 这将是更少（和更明显）的代码。

You should try not make performance judgements based on small numbers since (a) it isn't likely to be a performance bottleneck either way at this size and (b) the numbers may change in the future -- choose the algorithm with the best asymptotic performance. 您应该尽量不做基于小数字的性能判断，因为（a）它不可能是这种大小的性能瓶颈和（b）数字可能在未来发生变化 - 选择具有最佳渐近的算法性能。

Note: you mention 'switching' on the key ... that may apply in your case, but in general you can't switch on the key value in a map. 注意：你提到'切换'键可能适用于你的情况，但一般情况下你不能在地图中打开键值。 Allowing for this would make the code to searching for M items in a map by iteration even more complex. 允许这样做会使代码通过迭代在地图中搜索M项更加复杂。

8 finds is best, because the code's simpler and clearer. 8发现是最好的，因为代码更简单，更清晰。

Thinking about performance is more fun, though, so I'll do that too. 但是，考虑性能会更有趣，所以我也会这样做。

As Artelius said while I was writing this answer, ignore the complexity. 正如Artelius在写这个答案时说的那样，忽略复杂性。 It's not relevant because you know that n=100. 这是不相关的，因为你知道n = 100。 For example, insertion sort has worse algorithmic complexity than quicksort (at least in the average case), but in almost all implementations, insertion sort is faster than quicksort for small n and so quicksorts break out to insertion sort towards the end. 例如，插入排序比快速排序具有更差的算法复杂度（至少在平均情况下），但在几乎所有实现中，插入排序比小n的快速排序快，因此快速排序突破到插入排序到最后。 Your n is also small, so the limit as n -> infinity isn't what matters. 你的n也很小，所以n - >无限的限制并不重要。

Since the code for both options is simple to write, I'd suggest profiling it. 由于两个选项的代码都很容易编写，我建议对其进行分析。 This will (a) tell you which is faster, and (b) prove that both are so fast that it doesn't matter which you do (unless it's the only thing your program does, and it does it a lot). 这将（a）告诉你哪个更快，并且（b）证明两者都是如此之快以至于你做了什么并不重要（除非它是你的程序唯一做的事情，它做了很多）。 Especially since you talk about switching on the key - if the key is an integral type then the limiting factor is more likely to be memory cache issues than any actual processing. 特别是因为你谈到了打开密钥 - 如果密钥是一个整数类型，那么限制因素比任何实际处理更可能是内存缓存问题。

However failing that, usually the way to compare searching algorithms is to count the comparisons, on the assumption that they're much slower than traversing the structures. 然而，如果不这样做，通常比较搜索算法的方法是计算比较，假设它们比遍历结构慢得多。 If nothing else, each comparison accesses memory and is an unpredictable branch, which are two things CPUs are often worst at. 如果没有别的，每个比较访问内存并且是一个不可预测的分支，这是CPU经常最糟糕的两件事。

If you sort your 8 elements before you start (which takes 24 comparisons or so) instead of the switch you propose, then because the map is also sorted, you only have to do one comparison at each node you traverse, plus one comparison per item you're searching for (compare one node from each "side". If they match increment both sides. If they don't match, increment the side with the smaller element). 如果您在开始之前对8个元素进行排序（这需要进行24次比较）而不是您建议的开关，那么因为地图也是排序的，您只需要在每个遍历的节点进行一次比较，再加上每个项目的一个比较你要搜索（比较每个“side”的一个节点。如果它们匹配增加两边。如果它们不匹配，用较小的元素增加一边）。 So that's 8+100 in the worst case, or less if you find all 8 before you get to the end. 所以在最坏的情况下是8 + 100，或者如果你在结束之前找到所有8个，则更少。 But the average position of the last of 8, if they're randomly located in the map, is still something like 8/9 of the way through. 但是，如果他们随机地位于地图中，那么8的最后一个的平均位置仍然是8/9。 So call it 8+88+24 = 120 comparisons, with 132 as the worst case. 所以称之为8 + 88 + 24 = 120比较，其中132为最差情况。 The best case is 8 (if the things you're searching for are all at the start) +24 (for the initial sort) = 32 comparisons, or even better if you get lucky on the sort as well or your 8 search items are ready-sorted for some reason. 最好的情况是8（如果您正在搜索的内容都在开始时）+24（初始排序）= 32次比较，或者如果您在排序方面也很幸运，或者您的8个搜索项目是由于某种原因准备好了。

The average depth of a node in a Red-Black tree (which map usually is) is slightly over log2(n). 红黑树（通常是地图）中节点的平均深度略高于log2（n）。 Call it 7 in this case since 2^7 is 128. So finding 8 elements should take something like 56 comparisons. 在这种情况下将其称为7，因为2 ^ 7是128.所以找到8个元素应该采取类似于56的比较。 IIRC the balance characteristic of a Red-Black tree is that the deepest node is no more than twice the depth of the shallowest. IIRC红黑树的平衡特征是最深节点不超过最浅节点深度的两倍。 So the worst case depth is floor(2*log2(n)), call it 15, for a total of 120, and the best is ceil(1/2 log2(n)), which is 4. That's 32 comparisons again. 所以最坏的情况深度是floor（2 * log2（n）），称之为15，总共120，最好是ceil（1/2 log2（n）），这是4.这是32次比较。

So assuming that comparisons are the only thing that affects speed, the 8 finds will be somewhere between 4 times as fast, and 4 times as slow, as the linear traversal, with a 2x better average. 因此，假设比较是影响速度的唯一因素，那么8个发现将比线性遍历快4倍，慢4倍，平均值高2倍。

The linear traversal will probably touch more memory, though, so might be slower on that account. 然而，线性遍历可能会触及更多内存，因此该帐户可能会更慢。 But ultimately for n=100 you're talking millisecond times, so just do whatever's the simplest code (probably the 8 finds). 但最终对于n = 100，你说的是毫秒级，所以只做最简单的代码（可能是8个发现）。 And did I mention that if you really want to know the speed you can't hope to predict, you just have to profile it? 我是否提到如果你真的想知道你无法预测的速度，你只需要描述一下吗？

As others have noted, I would probably just use find() on a map eight times and be done with it. 正如其他人所指出的那样，我可能只会在地图上使用find（）八次并完成它。 But there's an alternative to consider depending on your needs. 但根据您的需求，可以考虑另一种选择。 If the items in the map aren't going to change much after the map is constructed, or you don't need to interleave insertions with lookups, you might try just keeping the key/value pairs in a sorted vector. 如果构建映射后映射中的项目不会发生太大变化，或者您不需要将插入与查找交错，则可以尝试将键/值对保持在已排序的向量中。 If you do this, then you can use the lower_bound() function to do a binary search in logarithmic time. 如果执行此操作，则可以使用lower_bound（）函数以对数时间执行二进制搜索。 This has the benefit that if the keys that you are looking for can be ordered (and you know that they'll always be present) then you can use the iterator returned from the previous lookup as the lower bound for the next. 这样做的好处是，如果您要查找的键可以被排序（并且您知道它们将始终存在），那么您可以使用从先前查找返回的迭代器作为下一个的下限。 eg, 例如，

vector::iterator a = my_map.lower_bound( my_map.begin(), my_map.end(), "a" );
vector::iterator b = my_map.lower_bound( a + 1, my_map.end(), "b" );
vector::iterator c = my_map.lower_bound( b + 1, my_map.end(), "c" );
// ...

Both approaches have logarithmic lookup, but this can help reduce the constant somewhat. 这两种方法都有对数查找，但这有助于减少常数。

Here is the analysis for the time complexity of them (n is the item count in the map), which is guaranteed to do the lookup for find with logarithmic or better time complexity: 以下是对它们的时间复杂度的分析（n是映射中的项目计数），保证以对数或更好的时间复杂度查找查找：

8 * log2 n  for 8 times find
n for the iterate through all

The first one is bigger for smaller numbers (8 for n=2 for example), but at around 43, the first one will become better than the second one and stays so. 对于较小的数字，第一个更大（例如，对于n = 2，则为8），但是在大约43时，第一个将变得比第二个更好并保持不变。 So, you will want to use the first method, given that it also is more convenient to code. 因此，您将需要使用第一种方法，因为它也更方便代码。

You should use find 8 times. 你应该使用find 8次。

Think of the switch approach as taking each node and comparing it 8 times. 将切换方法视为占用每个节点并将其比较8次。 That's 800 comparisons, and you lose all benefit of the map being keyed at all, it might as well be a list. 这是800次比较，你会失去所有关键地图的好处，它也可能是一个列表。

With the find approach, you traverse the list using the benefits of the map. 使用查找方法，您可以使用地图的优势遍历列表。 I believe std::maps are implemented as binary trees, meaning searching for a key will only require comparing your key down to the depth of the tree, which will be 8~ for a 100 element binary tree. 我相信std :: maps是作为二叉树实现的，这意味着搜索一个键只需要将你的密钥与树的深度进行比较，对于一个100元素的二叉树，它将是8~。 Now, you can find all 8 elements with only 8*8 comparisons, or 64 comparisons. 现在，您可以找到所有8个元素，只有8 * 8个比较，或64个比较。

If it's that critical, I would implement both and benchmark the performance. 如果它是关键的，我会实现两者并对性能进行基准测试。

In theory it's whether 从理论上讲，这是否是

8 * lg(100) >?< 100 8 * lg（100）>？<100

Other considerations are if either of those numbers will ever change -- will it ever be more than 100 elements; 其他考虑因素是这些数字中的任何一个都会改变 - 它是否会超过100个元素; will you ever do more than 8 searches? 你会做超过8次搜索吗？

Let's assume "find" bails when it finds the key. 让我们假设找到钥匙时“找到”保释金。

Let's further assume that you code the "switch" sensibly, and it quits checking after it finds a match. 让我们进一步假设您明智地编码“开关”，并在找到匹配后退出检查。 We will also assume you don't bother to code it to bail on the whole process once all 8 have been found (that would probably be a pain to code up). 我们还会假设一旦找到所有8个代码，你就不用费心去编码整个过程（这可能是编码的痛苦）。

The "8 find" approach can expect (iow: on average) to perform 50 * 8 = 400 comparisons. “8 find”方法可以预期（低平均值）执行50 * 8 = 400次比较。

The "switch" approach can expect (iow: on average) to perform (8 * 100) - 28 = 772 comparisons. “切换”方法可以预期（低于：平均）执行（8 * 100） - 28 = 772比较。

So in terms of comparisons, the 8 finds approach is better. 因此，在比较方面，8发现方法更好。 However, the number of comparisons is small enough that you'd be better off just going with the option that is easier to understand. 但是，比较的数量足够小，以便您更好地选择更容易理解的选项。 That will probably be the 8 find approach too though. 尽管如此，这可能是8找到的方法。