阵列与地图的性能

Question

I have to loop over a subset of elements in a large array where each element point to another one (problem coming from the detection of connected component in a large graph). 我必须遍历大型数组中元素的子集，其中每个元素都指向另一个元素（问题来自于检测大图中的连接组件）。

My algo is going as follows: 1. consider 1st element 2. consider next element as the one pointed by the previous element. 我的算法如下：1.考虑第一个元素2.将下一个元素视为上一个元素所指向的元素。 3. loop until no new element is discover 4. consider next element not already consider in 1-3, get back to 1. Note that the number of elements to consider is much smaller than the total number of elements. 3.循环直到没有发现新元素为止。4.考虑1-3中尚未考虑的下一个元素，回到1。请注意，要考虑的元素数比元素总数小得多。

For what I see now, I can either: 对于我现在看到的内容，我可以：

//create a map of all element, init all values to 0, set to 1 when consider
map<int,int> is_set; // is_set.size() will be equal to N

or 要么

//create a (too) large array (total size), init to 0 the elements to consider
int* is_set = (int*)malloc(total_size * sizeof(int)); // is_set length will be total_size>>N

I know that accessing keys in map is O(log N) while it's only constant for arrays, but I don't know if malloc is not more costly at the creation while it also requires more memory? 我知道访问map中的键是O（log N），虽然它仅对数组恒定，但是我不知道malloc在创建时是否不是更昂贵，同时还需要更多的内存？

Answer 1

When in doubt, measure the performance of both alternatives . 如有疑问，请测量两种选择的性能 。 That's the only way to know for sure which approach will be fastest for your application. 这是唯一确定哪种方法对您的应用程序最快的唯一方法。

That said, a one-time large malloc is generally not terribly expensive. 也就是说，一次性大型malloc通常并不十分昂贵。 Also, although the map is O(log N), the big-O conceals a relatively large constant factor, at least for the std::map implementation, in my experience. 同样，尽管映射为O（log N），但根据我的经验，至少对于std::map实现，big-O会隐藏相对较大的常数。 I would not be surprised to find that the array approach is faster in this case, but again the only way to know for sure is to measure. 我发现在这种情况下使用数组方法更快并不感到惊讶，但是唯一可以肯定的方法就是测量。

Keep in mind too that although the map does not have a large up-front memory allocation, it has many small allocations over the lifetime of the object (every time you insert a new element, you get another allocation, and every time you remove an element, you get another free). 还要记住，尽管映射没有很大的前期内存分配，但是在对象的整个生命周期中它都有许多小的分配（每次插入新元素时，都会得到另一个分配，并且每次删除元素，您将获得另一个免费）。 If you have very many of these, that can fragment your heap, which may negatively impact performance depending on what else your application might be doing at the same time. 如果您有很多这些，那可能会使您的堆碎片化，这可能会对性能产生负面影响，这取决于您的应用程序同时可能在做什么。

Answer 2

If indexed search suits your needs (like provided by regular C-style arrays), probably std::map is not the right class for you. 如果索引搜索适合您的需求（如常规C样式数组提供的那样），则std::map可能不是适合您的类。 Instead, consider using std::vector if you need dynamic run-time allocation or std::array if your collection is fixed-sized and you just need the fastest bounds-safe alternative to a C-style pointer. 相反，如果需要动态运行时分配，请考虑使用std::vector如果集合的大小是固定的，并且仅需要C语言风格指针的最快边界安全替代方法，则考虑使用std::vector std::array 。

You can find more information on this previous post . 您可以在上一篇文章中找到更多信息。

Answer 3

I know that accessing keys in map is O(log N) while it's only constant for arrays, but I don't know if malloc is not more costly at the creation while it also requires more memory? 我知道访问map中的键是O（log N），虽然它仅对数组恒定，但是我不知道malloc在创建时是否不是更昂贵，同时还需要更多的内存？

Each entry in the map is dynamically allocated, so if the dynamic allocation is an issue it will be a bigger issue in the map. 映射中的每个条目都是动态分配的，因此，如果动态分配是一个问题，它将在映射中成为更大的问题。 As of the data structure, you can use a bitmap rather than a plain array of int's. 从数据结构开始，您可以使用位图而不是普通的int数组。 That will reduce the size of the array by a factor of 32 in architectures with 32bit int s, the extra cost of mapping the index into the array will in most cases be much smaller than the cost of the extra memory, as the structure is more compact and can fit in fewer cache lines. 在具有32位int的体系结构中，这将使数组的大小减少32倍，在大多数情况下，将索引映射到数组中的额外成本将比额外内存的成本小得多，因为结构更多紧凑，可以容纳更少的缓存行。

There are other things to consider, as whether the density of elements in the set is small or not. 还有其他要考虑的因素，例如集合中元素的密度是否很小。 If there are very few entries (ie the graph is sparse) then either option could be fine. 如果条目很少（即图形稀疏），则可以选择其中任何一种。 As a final option you can manually implement the map by using a vector of pair<int,int> and short them, then use binary search. 作为最后的选择，您可以通过使用pair<int,int>的向量并将它们短化来手动实现地图，然后使用二进制搜索。 That will reduce the number of allocations, incur some extra cost in sorting and provide a more compact O(log N) solution than a map. 这将减少分配数量，在排序上产生一些额外成本，并提供比映射更紧凑的O（log N）解决方案。 Still, I would try to go for the bitmask. 尽管如此，我还是会尝试使用位掩码。

阵列与地图的性能

问题描述

3 个解决方案

解决方案1
8 2012-05-03 16:46:53

解决方案2
2 2012-05-03 17:07:19

解决方案3
1 2012-05-03 16:47:37

阵列与地图的性能

问题描述

3 个解决方案

解决方案1 8 2012-05-03 16:46:53

解决方案2 2 2012-05-03 17:07:19

解决方案3 1 2012-05-03 16:47:37

解决方案1
8 2012-05-03 16:46:53

解决方案2
2 2012-05-03 17:07:19

解决方案3
1 2012-05-03 16:47:37