简体   繁体   中英

Creating unordered_set of unordered_set

I want to create a container that will store unique sets of integers inside.

I want to create something similar to

std::unordered_set<std::unordered_set<unsigned int>>

But g++ does not let me do that and says:

invalid use of incomplete type 'struct std::hash<std::unordered_set<unsigned int> >'

What I want to achieve is to have unique sets of unsigned ints.

How can I do that?

I'm adding yet another answer to this question as currently no one has touched upon a key point.

Everyone is telling you that you need to create a hash function for unordered_set<unsigned> , and this is correct. You can do so by specializing std::hash<unordered_set<unsigned>> , or you can create your own functor and use it like this:

unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor> s;

Either way is fine. However there is a big problem you need to watch out for:

For any two unordered_set<unsigned> that compare equal ( x == y ), they must hash to the same value: hash(x) == hash(y) . If you fail to follow this rule, you will get run time errors. Also note that the following two unordered_set s compare equal (using pseudo code here for clarity):

{1, 2, 3} == {3, 2, 1}

Therefore hash({1, 2, 3}) must equal hash({3, 2, 1}) . Said differently, the unordered containers have an equality operator where order does not matter. So however you construct your hash function, its result must be independent of the order of the elements in the container.

Alternatively you can replace the equality predicate used in the unordered_set such that it does respect order:

unordered_set<unordered_set<unsigned>, my_unordered_set_hash_functor,
                                       my_unordered_equal> s;

The burden of getting all of this right, makes:

unodered_set<set<unsigned>, my_set_hash_functor>

look fairly attractive. You still have to create a hash functor for set<unsigned> , but now you don't have to worry about getting the same hash code for {1, 2, 3} and {3, 2, 1} . Instead you have to make sure these hash codes are different.

I note that Walter's answer gives a hash functor that has the right behavior: it ignores order in computing the hash code. But then his answer (currently) tells you that this is not a good solution. :-) It actually is a good solution for unordered containers. An even better solution would be to return the sum of the individual hashes instead of hashing the sum of the elements.

You can do this, but like every unsorted_set/map element type the inner unsorted_set now needs a Hash function to be defined. It does not have one by default but you can write one yourself.

What you have to do is to define an appropriate hash for keys of type std::unordered_set<unsigned int> (since operator== is already defined for this key, you will not need to also provide the EqualKey template parameter for std::unordered_set<std::unordered_set<unsigned int>, Hash, EqualKey> .

One simple (albeit inefficient) option is to hash on the total sum of all elements of the set. This would look similar to this:

template<typename T>
struct hash_on_sum
: private std::hash<typename T::element_type>
{
  typedef T::element_type count_type;
  typedef std::hash<count_type> base;
  std::size_t operator()(T const&obj) const
  {
    return base::operator()(std::accumulate(obj.begin(),obj.end(),count_type()));
  }
};

typedef std::unordered_set<unsigned int> inner_type;
typedef std::unordered_set<inner_type, hash_on_sum<inner_type>> set_of_unique_sets;

However, while simple, this is not good, since it does not guarantee the following requirement. For two different parameters k1 and k2 that are not equal, the probability that std::hash<Key>()(k1) == std::hash<Key>()(k2) should be very small, approaching 1.0/std::numeric_limits<size_t>::max() .

std::unordered_set<unsigned int>> does not meet the requirement to be an element of a std::unordered_set since there is no default hash function (ie std::hash<> is no specialized for std::unordered_set<unsigned int>> ).

you can provide one (it should be fast, and avoid collisions as much as possible) :

class MyHash
{
public:
    std::size_t operator()(const std::unordered_set<unsigned int>& s) const 
    {
        return ... // return some meaningful hash of the et elements
    }
};

int main() {

    std::unordered_set<std::unordered_set<unsigned int>, MyHash> u;

}

You can see very good examples of hash functions in this answer .

You should really provide both a Hash and an Equality function meeting the standard requirement of an Unordered Associative Container.

Hash() the default function to create hashes of your set's elements does not know how to deal with an entire set as an element. Create a hash function that creates a unique value for every unique set and you're good to go.

This is the constructor for an unordered_set

explicit unordered_set( size_type bucket_count = /*implementation-defined*/, const Hash& hash = Hash(), const KeyEqual& equal = KeyEqual(), const Allocator& alloc = Allocator() ); http://en.cppreference.com/w/cpp/container/unordered_set/unordered_set

Perhaps the simplest thing for you to do is create a hash function for your unordered_set<unsigned int>

unsigned int my_hash(std::unordered_set<unsigned int>& element)
{
  for( e : element )
  {
     some sort of math to create a unique hash for every unique set
  }
}

edit: as seen in another answer, which I forgot completely, the hashing function must be within a Hash object. At least according to the constructor I pasted in my answer.

There's a reason there is no hash to unordered_set . An unordered_set is a mutable sequence by default. A hash must hold the same value for as long as the object is in the unordered_set . Thus your elements must be immutable. This is not guaranteed by using the modifier const& , as it only guaranties that only the main unordered_set and its methods will not modify the sub- unordered_set . Not using a reference could be a safe solution (you'd still have to write the hash function) but do you really want the overhead of moving/copying unordered_set s ?

You could instead use some kind of pointer. This is fine; a pointer is only a memory address and your unordered_set itself does not relocate (it might reallocate its element pool, but who cares ?). Therefore your pointer is constant and it can hold the same hash for its lifetime in the unordered_set . ( EDIT : as Howard pointed out, you must ensure that any order you element are stored for your set, if two sets have the same elements they are considered equal. By enforcing an order in how you store your integers, you freely get that two sets correspond to two equal vectors. )

As a bonus, you now can use a smart pointer within the main set itself to manage the memory of sub- unordered_set if you allocated them on the heap.

Note that this is still not your most efficient implementation to get a collection of sets of int. To make you sub-sets, you could write a quick wrapper around std::vector that stores the int, ordered by value. int int are small and cheap to compare, and using a dichotomic search is only O(log n) in complexity. A std::unordered_set is a heavy structure and what you lose by going from O(1) to O(log n) , you gain it back by having compact memory for each sets. This shouldn't be too hard to implement but is almost guaranteed to be better in performance.

Harder to implements solution would involve a trie .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM