简体   繁体   English

std :: unordered_map中的bucket接口有什么用?

[英]What is the use for buckets interface in std::unordered_map?

I've been watching this video from CppCon 2014 and discovered that there is an interface to access buckets underneath std::unordered_map . 我一直在观看来自CppCon 2014的视频发现std::unordered_map下面有一个访问存储桶的界面。 Now I have a couple of questions: 现在我有几个问题:

  • Are there any reasonable examples of the usage of this interface? 有没有合理的使用此接口的例子?
  • Why did the committee decide to define this interface, why typical STL container interface wasn't enough? 为什么委员会决定定义这个界面,为什么典型的STL容器接口还不够?

It is often enlightening to search for the proposal that introduced an item, as there is often an accompanying rationale. 搜索引入项目的提案通常很有启发性,因为通常有一个附带的理由。 In this case N1443 says this: 在这种情况下, N1443说:

G. Bucket Interface G.桶接口

Like all standard containers, each of the hashed containers has member function begin() and end(). 与所有标准容器一样,每个散列容器都具有成员函数begin()和end()。 The range [c.begin(), c.end()) contains all of the elements in the container, presented as a flat range. 范围[c.begin(),c.end())包含容器中的所有元素,以平坦范围表示。 Elements within a bucket are adjacent, but the iterator interface presents no information about where one bucket ends and the next begins. 存储桶中的元素是相邻的,但迭代器接口不提供有关一个存储桶结束和下一个存储桶开始的信息。

It's also useful to expose the bucket structure, for two reasons. 由于两个原因,暴露铲斗结构也很有用。 First, it lets users investigate how well their hash function performs: it lets them test how evenly elements are distributed within buckets, and to look at the elements within a bucket to see if they have any common properties. 首先,它允许用户调查他们的哈希函数的执行情况:它让他们测试元素在桶中的均匀分布,并查看存储桶中的元素以查看它们是否具有任何公共属性。 Second, if the iterators have an underlying segmented structure (as they do in existing singly linked list implementations), algorithms that exploit that structure, with an explicit nested loop, can be more efficient than algorithms that view the elements as a flat range. 其次,如果迭代器具有底层分段结构(就像它们在现有的单链表实现中那样),那么利用该结构的算法(具有显式嵌套循环)可能比将元素视为平坦范围的算法更有效。

The most important part of the bucket interface is an overloading of begin() and end(). bucket接口最重要的部分是begin()和end()的重载。 If n is an integer, [begin(n), end(n)) is a range of iterators pointing to the elements in the nth bucket. 如果n是整数,[begin(n),end(n))是指向第n个桶中元素的迭代器范围。 These member functions return iterators, of course, but not of type X::iterator or X::const_iterator. 当然,这些成员函数返回迭代器,但不是X :: iterator或X :: const_iterator类型。 Instead they return iterators of type X::local_iterator or X::const_local_iterator. 相反,它们返回X :: local_iterator或X :: const_local_iterator类型的迭代器。 A local iterator is able to iterate within a bucket, but not necessarily between buckets; 本地迭代器能够在桶内迭代,但不一定在桶之间迭代; in some implementations it's possible for X::local_iterator to be a simpler data structure than X::iterator. 在某些实现中,X :: local_iterator可能是比X :: iterator更简单的数据结构。 X::iterator and X::local_iterator are permitted to be the same type; 允许X :: iterator和X :: local_iterator使用相同的类型; implementations that use doubly linked lists will probably take advantage of that freedom. 使用双向链表的实现可能会利用这种自由。

This bucket interface is not provided by the SGI, Dinkumware, or Metrowerks implementations. SGI,Dinkumware或Metrowerks实现不提供此存储桶接口。 It is inspired partly by the Metrowerks collision-detection interface, and partly by earlier work (see [Austern 1998]) on algorithms for segmented containers. 它部分受到了Metrowerks碰撞检测界面的启发,部分受到早期工作(参见[Austern 1998])关于分段容器算法的启发。

I imagine you can benefit greatly from this if you're in a high performance situation and collisions end up killing you. 如果您处于高性能状态并且碰撞最终导致您死亡,我想您可以从中受益匪浅。 Iterating the buckets and looking @ the bucket size periodically could tell you if your hashing policy is good enough. 迭代桶并定期查看@桶大小可以告诉您散列策略是否足够好。

Unordered maps are greatly dependent on their hashing policy when it comes to performance. 在性能方面,无序映射在很大程度上取决于它们的散列策略。

There is a number of algorithms which require the objects to be hashed into some number of buckets, and then each bucket is processed. 有许多算法要求将对象散列到一定数量的桶中,然后处理每个桶。

Say, you want to find duplicates in a collection. 比如,你想在集合中找到重复项。 You hash all items in the collection, then in each bucket you compare items pairwise. 您散列集合中的所有项目,然后在每个存储桶中成对地比较项目。

A bit less trivial example is Apriori algorithm for finding frequent itemsets. 一个不那么简单的例子是用于寻找频繁项目集的Apriori算法

The only reason I have ever needed the interface is to traverse all the objects in a map without having to hold a lock on the map or copy the map. 我需要界面的唯一原因是遍历地图中的所有对象,而无需在地图上保持锁定或复制地图。 This can be used for imprecise expiration or other types of periodic checks on objects in the map. 这可用于不精确的到期或对地图中对象的其他类型的定期检查。

The traverse works as follows: 遍历的工作原理如下:

  1. Lock the map. 锁定地图。

  2. Begin traversing the map in bucket order, operating on each object you encounter. 开始以桶顺序遍历地图,对您遇到的每个对象进行操作。

  3. When you decide you've held the lock for too long, stash the key of the object you last operated on. 当您决定锁定时间过长时,请存放上次操作的物体的钥匙。

  4. Wait until you wish to resume operating. 等到你想恢复运营。

  5. Lock the map, and go to step 2, starting at or near (in bucket order) the key you stopped on. 锁定地图,然后转到步骤2,从停止的键开始(以桶顺序)开始。 If you reach the end, start back at the beginning. 如果你到达终点,请从头开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM