简体   繁体   中英

Vector vs Set Efficiency

I'm working with a large dataset. When inserting elements one by one to a set, or to a vector, I had the following confusion (in C++):

  • Should I reserve enough space for the vector before insertion, and then add to the vector one by one?
  • Or, should I insert the elements to a set one by one (since insertion to a set is faster than that of a vector) and then that set at once to a vector?

Which one would be more time-efficient?

Should I reserve enough space for the vector before insertion, and then add to the vector one by one?

Yes. By allocating space for all required vector elements you will be avoiding additional memory allocations and memory copying. Use the vector only if you don't need the elements in any particular order and if you will only add elements to the end of the vector . Inserting elements somewhere in the middle is very inefficient because the vector stores all elements in contiguous memory and would therefore need to move all elements after the insertion point out of the way before inserting the new element.

Or, should I insert the elements to a set one by one (since insertion to a set is faster than that of a vector) and then that set at once to a vector?

If you need elements to be in a specific order then you should use the set . The set will place elements into the "right" place efficiently, assuming that your elements are of a type that set understands (for example, numerical types or string ), otherwise you may need to write your own comparison function. Alternatively, if you have more complex data but can identify a sort-able key value then you may want to look at map instead of set . - Afterwards, you can't initialize a vector from a set "at once"; you would need to loop through the set and append to the vector .

Which one would be more time-efficient?

Considering that you have a large amount of data as input and assuming that the data is in random order:

If you don't care about the order of elements then calling push_back on a vector is most likely faster.

If you plan to insert elements somewhere in the middle then the set is most likely faster, probably even if you need to transfer the data to a vector in a second step.

All this depends a bit on the type of data, the potential comparison that you may want to perform, the implementation of the standard library and the compiler.

Now that you know the expected result I suggest that you try both. Test and measure!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM