The fastest way to populate std::vector of unknown size

Question

I have a long array of data ( n entities). Every object in this array has some values (let's say, m values for an object). And I have a cycle like:

myType* A; 

// reading the array of objects   
std::vector<anotherType> targetArray;
int i, j, k = 0;
for (i = 0; i < n; i++)
     for (j = 0; j < m; j++)
     { 
         if (check((A[i].fields[j]))
         {
             // creating and adding the object to targetArray
             targetArray[k] = someGenerator(A[i].fields[j]);
             k++;
         }
     }

In some cases I have n * m valid objects, in some ( n * m ) /10 or less.
The question is how do I allocate a memory for targetArray ?

targetArray.reserve(n*m); // Do work targetArray.shrink_to_fit();
Count the elements without generating objects, and then allocate as much memory as I need and go with cycle one more time.
Resize the array on every iteration where new objects are being created.

I see a huge tactical mistake in each of my methods. Is another way to do it?

Answer 1

What you are doing here is called premature optimization. By default, std::vector will exponentially increase its memory footprint as it runs out of memory to store new objects. For example, a first push_back will allocate 2 elements. The third push_back will double the size etc. Just stick with push_back and get your code working.

You should start thinking about memory allocation optimization only when the above approach proves itself as a bottleneck in your design. If that ever happens, I think the best bet would be to come up with a good approximation for a number of valid objects and just call reserve() on a vector. Something like your first approach. Just make sure your shrink to fit implementation is correct because vectors don't like to shrink. You have to use swap .

Resizing array on every step is no good and std::vector won't really do it unless you try hard.

Doing an extra cycle through the list of objects can help, but it may also hurt as you could easily waste CPU cycles, bloat CPU cache etc. If in doubt - profile it.

Answer 2

The typical way would be to use targetArray.push_back(). This reallocates the memory when needed and avoids two passes through your data. It has a system for reallocating the memory that makes it pretty efficient, doing fewer reallocations as the vector gets larger.

However, if your check() function is very fast, you might get better performance by going through the data twice, determining how much memory you need and making your vector the right size to begin with. I would only do this if profiling has determined it is really necessary though.

The fastest way to populate std::vector of unknown size

Question

2 answers

solution1
5 ACCPTED

solution2
4 2012-07-01 17:44:14

The fastest way to populate std::vector of unknown size

Question

2 answers

solution1 5 ACCPTED

solution2 4 2012-07-01 17:44:14

solution1
5 ACCPTED

solution2
4 2012-07-01 17:44:14