unordered_set 与向量——更喜欢惯用的还是高性能的？

Question

我正在处理与其他同类数据不同的数据。 非常抽象地说，一set符合我正在使用的数据的定义。 出于这个原因，我倾向于使用std::unordered_set而不是std::vector 。

除此之外，这两个课程都可以满足我的要求。 我的问题是关于性能的——哪个性能更好？ 我不能以一种方式写出代码并对其进行基准测试，然后以另一种方式重写它。 这将花费我数百小时。 如果它们的表现相似，您认为坚持使用惯用的unordered_set是否值得？

这是一个更简单的用例。 一家公司正在销售电脑。 每一个都至少在一个方面是独一无二的，保证。

struct computer_t
{
    std::string serial;
    std::uint32_t gb_of_ram;
};
std::unordered_set<computer_t> all_computers_in_existence;
std::unordered_set<computer_t> computers_for_sale; // subset of above
// alternatively
std::vector<computer_t> all_computers_in_existence;
std::vector<computer_t> computers_for_sale; // subset of above

该公司希望停止销售不受欢迎的计算机，并用其他可能受欢迎的计算机来代替它们。

std::unordered_set<computer_t> computers_not_for_sale;
std::set_difference(all_computers_in_existence.begin(), all_computers_in_existence.end(),
                    computers_for_sale.begin(), computers_for_sale.end(),
                    std::inserter(computers_not_for_sale, computers_not_for_sale.end()));

calculate_and_remove_least_sold(computers_for_sale);
calculate_and_add_most_likely_to_sell(computers_for_sale, computers_not_for_sale);

基于上面的示例代码，我应该选择什么？ 还是我应该调查另一个新的 STL 功能（在 C++17 中）？ 对于我的用例来说，这确实是通用的，而不会使这篇文章的细节变得非常长。

Answer 1

惯用语应该是您的首选。 如果您使用 unordered_set 实现它并且性能不够好，则有更快的非 STL hash 表可以轻松切换到。 99% 的时间都不会这样。

您使用std::set_difference的示例代码将不起作用，因为这需要对输入进行排序，而unordered_set不是。 没关系，使用unordered_set::erase(key)可以轻松完成减法。

Answer 2

几百小时？

您创建一个新的 class “计算机列表”，其中一个无序集或 std::vector 作为唯一成员。 你用这个结构替换所有的 std::vector<computer_t> 。 由于调用向量 function 而无法编译的任何内容，将内联 function 添加到此 class 执行相同的操作。 最坏的情况应该花费你几个小时。

unordered_set 与向量——更喜欢惯用的还是高性能的？

问题描述

2 个解决方案

解决方案1
1 2022-02-05 07:54:03

解决方案2
-1 2022-02-05 08:08:33

unordered_set 与向量——更喜欢惯用的还是高性能的？

问题描述

2 个解决方案

解决方案1 1 2022-02-05 07:54:03

解决方案2 -1 2022-02-05 08:08:33

解决方案1
1 2022-02-05 07:54:03

解决方案2
-1 2022-02-05 08:08:33