unordered_set 与向量——更喜欢惯用的还是高性能的？

Question

I'm working with data that is unique from other data of the same type.我正在处理与其他同类数据不同的数据。 Very abstractly, a set fits the definition of the data I'm working with.非常抽象地说，一set符合我正在使用的数据的定义。 I feel inclined to use std::unordered_set instead of std::vector for that reason.出于这个原因，我倾向于使用std::unordered_set而不是std::vector 。

Beyond that, both classes can fit my requirements.除此之外，这两个课程都可以满足我的要求。 My question is about performance -- which might perform better?我的问题是关于性能的——哪个性能更好？ I cannot write out the code one way and benchmark it, then rewrite it the other way.我不能以一种方式写出代码并对其进行基准测试，然后以另一种方式重写它。 That will take me hundreds of hours.这将花费我数百小时。 If they'll perform similarly, do you think it would be worth-while to stick with the idiomatic unordered_set ?如果它们的表现相似，您认为坚持使用惯用的unordered_set是否值得？

Here is a simpler use case.这是一个更简单的用例。 A company is selling computers.一家公司正在销售电脑。 Each is unique from another in at least one way, guaranteed.每一个都至少在一个方面是独一无二的，保证。

struct computer_t
{
    std::string serial;
    std::uint32_t gb_of_ram;
};
std::unordered_set<computer_t> all_computers_in_existence;
std::unordered_set<computer_t> computers_for_sale; // subset of above
// alternatively
std::vector<computer_t> all_computers_in_existence;
std::vector<computer_t> computers_for_sale; // subset of above

The company wants to stop selling computers that aren't popular and replace them with other computers that might be.该公司希望停止销售不受欢迎的计算机，并用其他可能受欢迎的计算机来代替它们。

std::unordered_set<computer_t> computers_not_for_sale;
std::set_difference(all_computers_in_existence.begin(), all_computers_in_existence.end(),
                    computers_for_sale.begin(), computers_for_sale.end(),
                    std::inserter(computers_not_for_sale, computers_not_for_sale.end()));

calculate_and_remove_least_sold(computers_for_sale);
calculate_and_add_most_likely_to_sell(computers_for_sale, computers_not_for_sale);

Based on the above sample code, what should I choose?基于上面的示例代码，我应该选择什么？ Or is there another, new STL feature (in C++17) I should investigate?还是我应该调查另一个新的 STL 功能（在 C++17 中）？ This really is as generic as it gets for my use-case without making this post incredibly long with details.对于我的用例来说，这确实是通用的，而不会使这篇文章的细节变得非常长。

Answer 1

Idiomatic should be your first choice.惯用语应该是您的首选。 If you implement it using unordered_set and the performance is not good enough, there are faster non-STL hash tables which are easy to switch to.如果您使用 unordered_set 实现它并且性能不够好，则有更快的非 STL hash 表可以轻松切换到。 99% of the time it won't come to that. 99% 的时间都不会这样。

Your example code using std::set_difference will not work, because that requires the inputs be sorted, which unordered_set is not.您使用std::set_difference的示例代码将不起作用，因为这需要对输入进行排序，而unordered_set不是。 That's OK though, subtracting is done easily using unordered_set::erase(key) .没关系，使用unordered_set::erase(key)可以轻松完成减法。

Answer 2

Hundreds of hours?几百小时？

You create a new class “list of computers” with either an unordered set or a std::vector as the sole member.您创建一个新的 class “计算机列表”，其中一个无序集或 std::vector 作为唯一成员。 You replace all std::vector<computer_t> with this struct.你用这个结构替换所有的 std::vector<computer_t> 。 Anything that doesn't compile because it call a vector function, add an inline function to this class doing the same operation.由于调用向量 function 而无法编译的任何内容，将内联 function 添加到此 class 执行相同的操作。 That should take you some hours at worst.最坏的情况应该花费你几个小时。

unordered_set 与向量——更喜欢惯用的还是高性能的？

问题描述

2 个解决方案

解决方案1
1 2022-02-05 07:54:03

解决方案2
-1 2022-02-05 08:08:33

unordered_set 与向量——更喜欢惯用的还是高性能的？

问题描述

2 个解决方案

解决方案1 1 2022-02-05 07:54:03

解决方案2 -1 2022-02-05 08:08:33

解决方案1
1 2022-02-05 07:54:03

解决方案2
-1 2022-02-05 08:08:33