简体   繁体   English

C++ 中的 set 和 unordered_set 有什么区别?

[英]what is the difference between set and unordered_set in C++?

I came across this good question, which is similar but not at all same since it talks about Java, which has different implementation of hash-tables, by virtue of having synchronized accessor /mutators: What are the differences between a HashMap and a Hashtable in Java?我遇到了一个很好的问题,它是相似的,但完全不一样,因为它谈到了 Java,它具有不同的哈希表实现,凭借同步的访问器 /mutators: HashMap 和 Hashtable 之间有什么区别爪哇?

So what is the difference in C++ implementation of set and unordered_set ?那么setunordered_set的 C++ 实现有什么区别呢? This question can be of course extended to map vs unordered_map and so on for other C++ containers.这个问题当然可以扩展到其他 C++ 容器的map vs unordered_map等等。

Here is my initial assessment:这是我的初步评估:

set : While the standard doesn't explicitly ask it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a tree. set :虽然标准没有明确要求将其实现为树,但时间复杂性约束要求其查找/插入操作,这意味着它将始终以树的形式实现。 Usually as RB tree (as seen in GCC 4.8), which is height-balanced.通常作为 RB 树(如 GCC 4.8 中所见),它是高度平衡的。 Since they are height balanced, they have predictable time-complexity for find()由于它们是高度平衡的,因此它们对于find()具有可预测的时间复杂度

Pros: Compact (compared to other DS in comparison)优点:紧凑(与其他 DS 相比)

Con: Access time complexity is O(lg n)缺点:访问时间复杂度为 O(lg n)

unordered_set : While the standard doesn't explicitly asks it to be implemented as trees, the time-complexity constraint asked for its operations for find/insert, means it will always be implemented as a hash-table. unordered_set :虽然标准没有明确要求将其实现为树,但时间复杂性约束要求其查找/插入操作,这意味着它将始终作为哈希表实现。

Pros:优点:

  1. Faster (promises amortized O(1) for search)更快(承诺为搜索摊销 O(1))
  2. Easy to convert basic primitives to thread-safe, as compared to tree-DS与 tree-DS 相比,易于将基本原语转换为线程安全

Cons:缺点:

  1. Look up not guaranteed to be O(1).查找不保证是 O(1)。 Theoretical worst case is O(n).理论上最坏的情况是 O(n)。
  2. Not as compact as tree (for practical purposes load factors is never 1).不像树那么紧凑(出于实际目的,负载因子永远不会是 1)。

Note: The O(1), for hashtable comes from the assumption that there are no collision.注意:哈希表的 O(1) 来自没有冲突的假设。 Even with load-factor of .5, every second variable insertion is leading to collision.即使负载因子为 0.5,每第二个变量插入都会导致碰撞。 It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it.可以观察到,哈希表的负载因子与访问其中元素所需的操作数成反比。 More we reduce #operations, sparser hash-table.我们减少了更多#operations,更稀疏的哈希表。 When the element stored are of size comparable to pointer, then overhead is quite significant.当存储的元素的大小与指针相当时,开销就相当大了。

Did I miss any difference between map/set for performance analysis that one should know?我是否错过了应该知道的用于性能分析的 map/set 之间的任何区别?

I think you've generally answered your own question, however, this:我认为您通常已经回答了自己的问题,但是,这是:

Not as compact as tree.不像树那么紧凑。 (for practical purposes load factors is never 1) (出于实际目的,负载因子永远不会是 1)

is not necessarily true.不一定是真的。 Each node of a tree (we'll assume it's a red-black tree) for a type T utilizes space that is equal to at least 2 * pointer_size + sizeof(T) + sizeof(bool) .类型T的树的每个节点(我们假设它是红黑树)使用的空间至少等于2 * pointer_size + sizeof(T) + sizeof(bool) This may be 3 * pointer size depending on whether the tree contains a parent pointer for each tree node.这可能是3 * pointer size ,具体取决于树是否包含每个树节点的parent指针。

Compare this to a hash-map: there will be wasted array space for each hash map due to the fact that load factor < 1 as you've said.将此与哈希映射进行比较:由于load factor < 1如您所说,每个哈希映射都会浪费数组空间。 However, assuming the hash map uses singly linked lists for chaining (and really, there's no real reason not to), each element inserted take only sizeof(T) + pointer size .但是,假设哈希映射使用单链表进行链接(实际上,没有真正的理由不这样做),插入的每个元素只需要sizeof(T) + pointer size

Note that this analysis ignores any overhead which may come from extra space used by alignment.请注意,此分析忽略了可能来自对齐使用的额外空间的任何开销。

For any element T which has a small size (so, any basic type), the size of the pointers and other overhead dominates.对于任何具有小尺寸的元素T (因此,任何基本类型),指针的大小和其他开销占主导地位。 At a load factor of > 0.5 (for example) the std::unordered_set may indeed use up less memory than the equivalent std::set .> 0.5 (例如)的负载因子下, std::unordered_set可能确实比等效的std::set占用更少的内存。

The other big missing point is the fact that iterating through a std::set is guaranteed to produce an ordering from smallest to largest, based on the given comparison function, while iterating through an std::unordered_set will return the values in a "random" order.另一个重要的缺失点是这样一个事实,即根据给定的比较函数,遍历std::set可以保证产生从最小到最大的排序,而遍历std::unordered_set将返回“随机”中的值“ 命令。

Another difference (though not performance-related) is that set insertion doesn't invalidate iterators, while unordered_set insertion can if it triggers a rehash.另一个区别(尽管与性能无关)是set插入不会使迭代器无效,而unordered_set插入可以在触发 rehash 时。 In practice it's a pretty minor concern, since references to the actual elements remain valid.在实践中,这是一个非常小的问题,因为对实际元素的引用仍然有效。

Yuushi addresses spatial efficiency and other points well already; Yuushi 已经很好地解决了空间效率和其他问题; just a few other parts of the question I'll comment on...我将评论问题的其他几个部分......

The O(1), for hashtable comes from the assumption that there are no collision.哈希表的 O(1) 来自没有冲突的假设。

That's not true.这不是真的。 What O(1) means is not that the first lookup attempt will always succeed, it's that there is - on average - a constant number of attempts needed, rather than something that grows as the number of values grows. O(1) 的意思并不是第一次查找尝试总是会成功,而是平均而言,需要的尝试次数是恒定的,而不是随着值数量的增加而增加。 For example, with an unordered_set or ... _map , the max_load_factor defaults to 1.0 on construction, and if load factor approaches that with a good hash function, the average number of elements that hash to any one bucket will be around 2 regardless of how many values are in the table.例如,对于unordered_set或 ... _mapmax_load_factor在构造时默认为 1.0,如果负载因子通过良好的散列函数接近该值,则散列到任何一个桶的平均元素数将在 2 左右,无论如何许多值都在表中。

Even with load-factor of .5, every second variable insertion is leading to collision.即使负载因子为 0.5,每第二个变量插入都会导致碰撞。

True, but it doesn't get as dire as you might intuitively expect: that average chain length of 2 at 1.0 load factor's not bad.没错,但它并没有你想象的那么可怕:在 1.0 的负载因子下,平均链长度为 2 还不错。

It could be observed that the load-factor of hash-table is inversely proportional to the number of operations required for accessing a element in it.可以观察到,哈希表的负载因子与访问其中元素所需的操作数成反比。 More we reduce #operations, sparser hash-table.我们减少了更多#operations,更稀疏的哈希表。

There's definitely a correlation (it's not inverse).肯定存在相关性(不是相反的)。

In some case set is more convenient.在某些情况下set更方便。

For example using vector as key:例如使用vector作为键:

set<vector<int>> s;
s.insert({1, 2});
s.insert({1, 3});
s.insert({1, 2});

for(const auto& vec:s)
    cout<<vec<<endl;   // I have override << for vector
// 1 2
// 1 3 

The reason why vector<int> can be in set because vector override operator< .之所以可以set vector<int>是因为vector override operator<

But if you use unordered_set<vector<int>> you have to create a hash function for vector<int> , because vector does't have a hash function, so you have to define one like:但是如果你使用unordered_set<vector<int>>你必须为vector<int>创建一个哈希函数,因为 vector 没有哈希函数,所以你必须定义一个像:

struct VectorHash {
    size_t operator()(const std::vector<int>& v) const {
        std::hash<int> hasher;
        size_t seed = 0;
        for (int i : v) {
            seed ^= hasher(i) + 0x9e3779b9 + (seed<<6) + (seed>>2);
        }
        return seed;
    }
};

vector<vector<int>> two(){
    //unordered_set<vector<int>> s; // error vector<int> doesn't  have hash function
    unordered_set<vector<int>, VectorHash> s;
    s.insert({1, 2});
    s.insert({1, 3});
    s.insert({1, 2});

    for(const auto& vec:s)
        cout<<vec<<endl;
    // 1 2
    // 1 3
}

you can see that in some case unordered_set is more complicated.你可以看到在某些情况下unordered_set更复杂。

Mainly cited from: https://stackoverflow.com/a/29855973/6329006主要引用自: https ://stackoverflow.com/a/29855973/6329006

More difference between unordered_set and set see this: https://stackoverflow.com/a/52203931/6329006 unordered_setset之间的更多区别请参见: https ://stackoverflow.com/a/52203931/6329006

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM