简体   繁体   中英

unordered_set vs vector -- prefer idiomatic or performant?

I'm working with data that is unique from other data of the same type. Very abstractly, a set fits the definition of the data I'm working with. I feel inclined to use std::unordered_set instead of std::vector for that reason.

Beyond that, both classes can fit my requirements. My question is about performance -- which might perform better? I cannot write out the code one way and benchmark it, then rewrite it the other way. That will take me hundreds of hours. If they'll perform similarly, do you think it would be worth-while to stick with the idiomatic unordered_set ?

Here is a simpler use case. A company is selling computers. Each is unique from another in at least one way, guaranteed.

struct computer_t
{
    std::string serial;
    std::uint32_t gb_of_ram;
};
std::unordered_set<computer_t> all_computers_in_existence;
std::unordered_set<computer_t> computers_for_sale; // subset of above
// alternatively
std::vector<computer_t> all_computers_in_existence;
std::vector<computer_t> computers_for_sale; // subset of above

The company wants to stop selling computers that aren't popular and replace them with other computers that might be.

std::unordered_set<computer_t> computers_not_for_sale;
std::set_difference(all_computers_in_existence.begin(), all_computers_in_existence.end(),
                    computers_for_sale.begin(), computers_for_sale.end(),
                    std::inserter(computers_not_for_sale, computers_not_for_sale.end()));

calculate_and_remove_least_sold(computers_for_sale);
calculate_and_add_most_likely_to_sell(computers_for_sale, computers_not_for_sale);

Based on the above sample code, what should I choose? Or is there another, new STL feature (in C++17) I should investigate? This really is as generic as it gets for my use-case without making this post incredibly long with details.

Idiomatic should be your first choice. If you implement it using unordered_set and the performance is not good enough, there are faster non-STL hash tables which are easy to switch to. 99% of the time it won't come to that.

Your example code using std::set_difference will not work, because that requires the inputs be sorted, which unordered_set is not. That's OK though, subtracting is done easily using unordered_set::erase(key) .

Hundreds of hours?

You create a new class “list of computers” with either an unordered set or a std::vector as the sole member. You replace all std::vector<computer_t> with this struct. Anything that doesn't compile because it call a vector function, add an inline function to this class doing the same operation. That should take you some hours at worst.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM