performance of array vs. map

Question

I have to loop over a subset of elements in a large array where each element point to another one (problem coming from the detection of connected component in a large graph).

My algo is going as follows: 1. consider 1st element 2. consider next element as the one pointed by the previous element. 3. loop until no new element is discover 4. consider next element not already consider in 1-3, get back to 1. Note that the number of elements to consider is much smaller than the total number of elements.

For what I see now, I can either:

//create a map of all element, init all values to 0, set to 1 when consider
map<int,int> is_set; // is_set.size() will be equal to N

or

//create a (too) large array (total size), init to 0 the elements to consider
int* is_set = (int*)malloc(total_size * sizeof(int)); // is_set length will be total_size>>N

I know that accessing keys in map is O(log N) while it's only constant for arrays, but I don't know if malloc is not more costly at the creation while it also requires more memory?

Answer 1

When in doubt, measure the performance of both alternatives . That's the only way to know for sure which approach will be fastest for your application.

That said, a one-time large malloc is generally not terribly expensive. Also, although the map is O(log N), the big-O conceals a relatively large constant factor, at least for the std::map implementation, in my experience. I would not be surprised to find that the array approach is faster in this case, but again the only way to know for sure is to measure.

Keep in mind too that although the map does not have a large up-front memory allocation, it has many small allocations over the lifetime of the object (every time you insert a new element, you get another allocation, and every time you remove an element, you get another free). If you have very many of these, that can fragment your heap, which may negatively impact performance depending on what else your application might be doing at the same time.

Answer 2

If indexed search suits your needs (like provided by regular C-style arrays), probably std::map is not the right class for you. Instead, consider using std::vector if you need dynamic run-time allocation or std::array if your collection is fixed-sized and you just need the fastest bounds-safe alternative to a C-style pointer.

You can find more information on this previous post .

Answer 3

I know that accessing keys in map is O(log N) while it's only constant for arrays, but I don't know if malloc is not more costly at the creation while it also requires more memory?

Each entry in the map is dynamically allocated, so if the dynamic allocation is an issue it will be a bigger issue in the map. As of the data structure, you can use a bitmap rather than a plain array of int's. That will reduce the size of the array by a factor of 32 in architectures with 32bit int s, the extra cost of mapping the index into the array will in most cases be much smaller than the cost of the extra memory, as the structure is more compact and can fit in fewer cache lines.

There are other things to consider, as whether the density of elements in the set is small or not. If there are very few entries (ie the graph is sparse) then either option could be fine. As a final option you can manually implement the map by using a vector of pair<int,int> and short them, then use binary search. That will reduce the number of allocations, incur some extra cost in sorting and provide a more compact O(log N) solution than a map. Still, I would try to go for the bitmask.

performance of array vs. map

Question

3 answers

solution1
8 2012-05-03 16:46:53

solution2
2 2012-05-03 17:07:19

solution3
1 2012-05-03 16:47:37

performance of array vs. map

Question

3 answers

solution1 8 2012-05-03 16:46:53

solution2 2 2012-05-03 17:07:19

solution3 1 2012-05-03 16:47:37

solution1
8 2012-05-03 16:46:53

solution2
2 2012-05-03 17:07:19

solution3
1 2012-05-03 16:47:37