简体繁体 English

SortedList与SortedDictionary vs. Sort（）

[英]SortedList vs. SortedDictionary vs. Sort()

原文 2010-01-10 11:52:14 1 1 .net/ performance/ sorting/ sortedlist/ sorteddictionary

This is a continuation of questions like this one . 这是像这样的问题的延续。

Are there any guidelines for tweaking the performance? 是否有任何调整性能的指导原则？ I don't mean gains in big-O, just saving some linear time. 我并不是指大O的收益，只是节省一些线性时间。

For example, how much does pre-sorting save on either SortedList or SortedDictionary ? 例如，预排序可以在SortedList或SortedDictionary上保存多少？

Say I have a person-class with 3 properties to sort on, one of them is age in years. 假设我有一个有3个属性的人类排序，其中一个是年龄。 Should I bucket the objects on age first? 我应该先按年龄换取物品吗？

Should I first sort on one property, then use the resulting list/dictionary to sort on two properties and so on? 我应该首先对一个属性进行排序，然后使用结果列表/字典对两个属性进行排序，依此类推？

Any other optimizations that spring to mind? 想到的任何其他优化？

1 个解决方案

Well, it's an easy win on SortedList. 好吧，它在SortedList上轻松获胜。 Inserting an item requires a binary search (O(log(n)) to find the insertion point, then a List.Insert (O(n)) to insert the item. The Insert() dominates, populating the list requires O(n^2). If the input items are already sorted then the Insert collapses to O(1) but doesn't affect the search. Populating is now O(nlog(n)). You don't worry how big the Oh is, sorting first is always more efficient. Assuming you can afford the doubled storage requirement. 插入项目需要二进制搜索（O（log（n））来查找插入点，然后使用List.Insert（O（n））来插入项目.Insert（）占主导地位，填充列表需要O（n ^ 2）。如果输入项已经排序，那么Insert会折叠到O（1）但不会影响搜索。填充现在是O（nlog（n））。你不用担心哦有多大，首先排序总是更有效率。假设您可以承受双倍的存储需求。

SortedDictionary is different, it uses a red-black tree. SortedDictionary是不同的，它使用红黑树。 Finding the insertion point requires O(log(n)). 查找插入点需要O（log（n））。 Rebalancing the tree might be required afterwards, that also takes O(log(n)). 之后可能需要重新平衡树，这也需要O（log（n））。 Populating the dictionary thus takes O(nlog(n)). 因此填充字典需要O（nlog（n））。 Using sorted input does not change the effort to find the insertion point or rebalancing, it is still O(nlog(n)). 使用排序输入不会改变查找插入点或重新平衡的工作量，它仍然是O（nlog（n））。 Now the Oh matters though, inserting sorted input requires the tree to constant rebalance itself. 现在，哦很重要，插入已排序的输入需要树本身不断重新平衡。 It works better if the input is random, you don't want sorted input. 如果输入是随机的，你不需要排序输入它会更好。

So populating SortedList with sorted input and populating SortedDictionary with unsorted input is both O(nlog(n)). 因此，使用排序输入填充SortedList并使用未排序的输入填充SortedDictionary是O（nlog（n））。 Ignoring the cost of providing sorted input, the Oh of SortedList is smaller than the Oh of SortedDictionary. 忽略提供排序输入的成本，SortedList的Oh小于SortedDictionary的Oh。 That's an implementation detail due to the way List allocates memory. 由于List分配内存的方式，这是一个实现细节。 It only has to do so O(log(n)) times, a red-black tree has to allocate O(n) times. 它只需要执行O（log（n））次，红黑树必须分配O（n）次。 Very small Oh btw. 很小哦顺便一下。

Notable is that neither one compares favorably over simply populating a List, then calling Sort(). 值得注意的是，没有人比简单地填充List，然后调用Sort（）更有利。 That's also O(nlog(n)). 这也是O（nlog（n））。 In fact, if input is already accidentally sorted you can bypass the Sort() call, this collapses to O(n). 实际上，如果输入已被意外排序，则可以绕过Sort（）调用，这会折叠为O（n）。 The cost analysis now needs to move to the effort it takes to get the input sorted. 现在，成本分析需要转移到输入排序所需的工作量。 It is hard to bypass the fundamental complexity of Sort(), O(nlog(n)). 很难绕过Sort（），O（nlog（n））的基本复杂性。 It might not be readily visible, you might get the input sorted by, say, a SQL query. 它可能不容易看到，您可能会获得按SQL查询排序的输入。 It will just take longer to complete. 完成任务需要更长的时间。

The point of using either SortedList or SortedDictonary is to keep the collection sorted after inserts. 使用SortedList或SortedDictonary的目的是在插入后对集合进行排序。 If you only worry about populating but not mutating then you shouldn't use those collections. 如果您只担心填充但不变异，那么您不应该使用这些集合。