简体   繁体   中英

Simple space efficient implementations of an associative collection in C?

I am looking for an associative collection that supports both retrieval and insertion of values by key (deletion not important) in at least O(Log(N)) time, and that has a very low memory overhead both in terms of code size and run-time memory consumption.

I am doing this for a small embedded application written in C, so I am trying to minimize the amount of code required, and the amount of memory consumed.

The Google sparse hash data structure would be a possibility if it wasn't written in C++, and was simpler.

Most hash table implementations that I am aware of use a fair amount of extra space, requiring at least twice as much space as the total number of key-values, or else requiring extra pointers per entry (eg bucket chaining hash algorithms). In my structure, key value pairs are just two pointers.

Currently I am using an array of key/value pairs which is sorted, but the insertion is O(N). I can't help but think there must be a clever way to improve the amortized running time of insertion, for example by doing the insertions in groups, but I am not having any success.

I think that this must be a relatively well-known problem in certain circles, so to make this not too subjective, I'm wondering what the most common solution to the problem stated above is?

[EDIT:]

Some additional information that could be relevant:

  • Keys are integers
  • Number of values could be tiny anywhere from 1 to 2^32.
  • Usage patterns are unpredicatable.
  • I am hoping to keep memory consumption as low as possible (eg doubling the size of memory required, would not be ideal)

查看二叉搜索树并克服最坏情况(搜索和插入都具有O(n)复杂度)使用平衡树

You could use a hash table that doesn't use chaining, such as a linear probing or cuckoo hashing scheme. The backing implementation is just an array, and with a load factor of around 0.5, the overhead won't be too bad, and the implementation complexity (at least for linear or quadratic probing) isn't too much.

If you want a good implementation of a binary search tree that has excellent guarantees on performance and isn't too hard to code up, consider looking into splay trees. They guarantee amortized O(lg n) lookups, and require just two pointers per node. The balance step is also substantially easier than most balanced BSTs.

I'd probably use a hash table with double hashing to resolve collisions. The general idea is to hash your original value, and if that collides do a second hash that gives a step value you'll use in walking through the array to find a place to put the value. This makes quite good use of memory as it has no overhead for pointers, and retains reasonable efficiency at much higher load factors than linear probing.

Edit: If you want a variation of what you're doing right now, one possibility is to handle insertions in clusters: keep a sorted array, and a separate collection of new insertions. When the collection of new insertions gets too large, merge those items into the main collection.

For the secondary collection you have a couple of choices. You can just use an un-sorted array, and do a linear search -- and just limit its size so (say) log(M), where M is the size of the main array. In this case, an overall search remains O(log N), imposes no memory overhead, and keeps most insertions quite fast. When you do merge the collections together, you (normally) want to sort the secondary collection, then merge with the primary. This lets you amortize the linear merge over the number of items in the secondary collection.

Alternatively, you can use a tree for your secondary collection. This means newly inserted items use extra storage for pointers, but (again) keeping that size small limits the overhead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM