简体   繁体   中英

C# - Binary search a (sorted) Dictionary

I have a file of records, sorted alphabetically:

  • Andrew d432
  • Ben x127
  • ...
  • ...
  • Zac b332

The first field is a person name, the second field is some id. Once I read the file, I do not need to make any changes to the data.

I want to treat each record as a Key-Value pair, where the person name is the Key. I don't know which class to use in order to access a record (as fast as possible). Dictionary does not has a binary search. On the other hand, as I understand, SortedList and SortedDictionary should be used only when I need to insert/remove data.

Edit: To clarify, I'm talking about simply accessing a record, like:

x = MyDic[Zac]

What no one has stated is why dictionaries are O(1) and why it IS faster than a binary search. One side point is that dictionaries are not sorted by the key . The whole point of a dictionary is to go to the exact * (for all practical purposes) location of the item that is referenced by the key value. It does not "search" for the item - it knows the exact location of the item you want.

So a binary search would be pointless on a dictionary because there is no need to "search" for an item when the collection already knows exactly where it is.

*This isn't completely true in the case of hash collisions, but the principle of the dictionary is to get the item directly, and any additional lookups are an implementation detail and should be rare.

On the other hand, as I understand, SortedList and SortedDictionary should be used only when I need to insert/remove data.

They should be used when you want the data automatically sorted when adding or removing data. Note that SortedDictionary loses the performance gain of a "normal" dictionary because it now has to search for the location using the key value. It's primary use is to allow you to iterate over the keys in order.

If you have a unique key value per item, don't need to iterate the items in any particular order, and want the fastest "get" performance, then Dictionary is the way to go.

In general dictionary lookup will be faster than binary search of a collection. There are two specific cases when that's not true:

  1. If the list is small (fewer than 15 (possibly as low as 10) items, in my tests), then the overhead of computing a hash code and going through the dictionary lookup will be slower than binary search on an array. But beyond 15 items, dictionary lookup beats binary search, hands down.
  2. If there are many hash collisions (due either to a bad hash function or a dictionary with a high load factor), then dictionary lookup slows down. If it's really bad, then binary search could potentially beat dictionary lookup.

In 15 years working with .NET dictionaries holding all kinds of data, I've never seen #2 be a problem when using the standard String.GetHashCode() method with real world data. The only time I've run into trouble is when I created a bad GetHashCode() method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM