Which data structure should I use

Question

I am trying to figure out the best data structure to use for this problem. I am implementing a key value store with keys that are strings. The values get added frequently and will generally only get looked up 1 or 2 times. Initially I used an std::map , but I found the performance to be unoptimal, as the overhead of adding keys and rebalancing the red-black tree, overshadowed the decrease in time to search for a value. Currently I am using a modified single linked list. It uses a struct that contains ac string (const char *), the length in bytes, and the value stored. When I want to find a value using a key I iterate through the list and compare the size of the keys, if they match I use memcmp to check if the strings are identical. If they are identical, I return the value. I way able to achieve about 10x greater performance using this method over the std::map . I need to make it about 2x more efficient, however. Can anyone recommend a better type of data structure, for this problem?

Answer 1

std::vector迭代速度应比链表更快，而push_back()迭代速度也应更快，因为大多数时候不需要分配内存。

Answer 2

It is hard to come with a fast solution without any knowledge on the actual problem. In particular, how big is your dataset, where is the real data stored (is it stored in the container or somewhere else?). What other operations do you need to perform on the container? Do you need to delete elements from the container?

As a comment to one of the other questions you state that the keys need to be copied in std::unordered_map ... if the keys are already stored somewhere else, I would advice you to use a map, but avoid copying the strings. Use pointers as the keys, and a custom comparator to dereference and operate in the result:

// Assuming that the data is stored in std::string somewhere else
struct custom_compare {
   bool operator()( std::string* lhs, std::string* rhs ) const {
      return lhs!=rhs && (lhs->size() < rhs->size() || lhs->compare( *rhs ) < 0);
   }
};
std::map< std::string*, data, custom_compare > mymap;

By storing pointers instead of the actual strings this would take rid of copying. The custom comparator is basically as fast as the one you have implemented in the list and the tree will balance the contents, allowing for O(log n) lookups. Depending on the size of the set (if there are many elements) then this will be an improvement over linear search, while if the size is small then linear search will be better.

Also, depending on the diversity of the data, you might want to follow the linear search but divide the search space depending on some criteria that is fast to calculate and at the same time divides the set as evenly as possible. For example, you could use linear search, but instead of keeping a single list, keep different lists based on key length.

If the criterion is actually based on the contents of the string (letters, rather than size) then you are approximating the definition of a trie. If you get a library that already implements one, or you are willing to spend the time required to do so, a trie will probably be one of the fastest containers for this type of lookup, as it transforms the "size" variable from number of elements to length of the strings.

Answer 3

You have it as one of your tags...why not use a Trie ? Insertions should be quick, memory usage can go down due to overlap in the characters, and look ups are fast.

Answer 4

Perhaps some sort of hash table? Using a good hashing algorithm for your keys would dramatically speed up your search time. Your insertion time would be slowed a bit, but hopefully not a great deal if your hash function is good.

Which data structure should I use

Question

4 answers

solution1
3 2011-02-10 18:35:20

solution2
3 ACCPTED 2011-02-10 19:24:26

solution3
2 2011-02-10 18:39:12

solution4
0 2011-02-10 18:30:56

Which data structure should I use

Question

4 answers

solution1 3 2011-02-10 18:35:20

solution2 3 ACCPTED 2011-02-10 19:24:26

solution3 2 2011-02-10 18:39:12

solution4 0 2011-02-10 18:30:56

solution1
3 2011-02-10 18:35:20

solution2
3 ACCPTED 2011-02-10 19:24:26

solution3
2 2011-02-10 18:39:12

solution4
0 2011-02-10 18:30:56