简体   繁体   English

适用于需要快速检索的分类数据的Collection类

[英]Suitable Collection class for Sorted Data requiring fast retrieval

I am working in a scenario where I need to store a collection of KeyValuePair , having a DateTimeOffset as key. 我正在处理一个场景,我需要存储KeyValuePair的集合,并将DateTimeOffset作为键。 I am receiving a list of this data (via Http request) which I simply need to read and generate the collection from. 我收到了这个数据的列表(通过Http请求),我只需要阅读并生成集合。 It is required that the collection be maintained sorted, and it must be enumerable. 要求对集合进行排序,并且必须是可枚举的。 Also, I may need to do a lot of lookups on this data by key. 此外,我可能需要通过密钥对此数据进行大量查找。

Also note that the data I receive is already sorted in itself. 另请注意,我收到的数据已经自行排序。 I may repeat the operation of receiving data and generating the collection again, periodically. 我可以定期重复接收数据和再次生成集合的操作。 However, the existing collection is not modified, rather a new one is created each time I refresh the data. 但是,不会修改现有集合,而是每次刷新数据时都会创建一个新集合。

Now, I have these methods in mind: 现在,我想到了这些方法:

  1. Use a SortedDictionary<,> (My current method). 使用SortedDictionary<,> (我当前的方法)。
  2. Use a Dictionary<,> which is manually sorted after populating all items from the received data. 使用Dictionary<,>在填充接收数据中的所有项目后手动排序。 (While this makes it very fast to look-up (O(1)), I need to now sort the data, since a Dictionary<,> does not maintain its items when added in an ordered manner.) (虽然这使得查找速度非常快(O(1)),但我现在需要对数据进行排序,因为当按顺序添加时Dictionary<,>不会保留其项目。)
  3. Use a simple array (or List ) which is directly populated from the data. 使用直接从数据填充的简单数组(或List )。 The order of elements is maintained implicitly. 元素的顺序是隐式维护的。 Then, searching for items (ie look-ups) are done using Binary Search upon the keys. 然后,使用键上的二进制搜索来搜索项目(即查找)。

Which method is appropriate for this scenario? 哪种方法适合这种情况? Are there any other options or variations to the above methods I can use which will give me better overall performance? 我可以使用上述方法的其他选项或变体,这会给我更好的整体表现吗?

Edit 编辑

I'm sorry, I've forgotten to mention that I am developing for the WinRT (specifically Windows Phone) platform. 对不起,我忘了提到我正在为WinRT(特别是Windows Phone)平台开发。 Hence I cannot use SortedList<,> (nor OrderedDictionary ), which would have been the best choice as pointed out by @lc. 因此我不能使用SortedList<,> (也不是OrderedDictionary ),这本来是@lc指出的最佳选择。

Also, my collection will only have a few 100 items. 此外,我的收藏品只有几百件。 Perhaps at this scale there may not be any significant difference, but I'd like to know an answer all the same. 也许在这种规模上可能没有任何显着差异,但我想知道答案都是一样的。

Of the three options, I would certainly exclude 1 ( SortedDictionary ) because 3 (array or List ) outperforms it given your requirements (fast lookups, sorted, items provided in order, not modified). 在这三个选项中,我当然会排除1( SortedDictionary ),因为3(数组或List )在满足您的要求(快速查找,排序,按顺序提供的项目,未修改)时优于它。

Doing binary search on a sorted array runs in O(lg n) time. 对已排序的数组执行二进制搜索在O(lg n)时间内运行。 According to the documentation , look-ups in SortedDictionary also run in O(lg n) time, so no advantage in using it. 根据文档SortedDictionary查找也在O(lg n)时间内运行,因此使用它没有任何优势。

Since the data you get is already sorted, then the array is populated in O(n). 由于您获得的数据已经排序,因此数组将填充在O(n)中。 Insertion in SortedDictionary runs in O(lg n) so populating it runs in O(n * lg n), which is worse. SortedDictionary中的插入在O(lg n)中运行,因此填充它在O(n * lg n)中运行,这更糟糕。

Enumeration runs in O(n) time for both. 对于两者,枚举在O(n)时间内运行。

To answer your question, I think 2 and 3 are both viable options. 为了回答你的问题,我认为2和3都是可行的选择。 Which one is best depends on the ration of insertions/lookups/enumerations you will get. 哪一个最好取决于您将获得的插入/查找/枚举的比例。

For instance, if you do one billion lookups per enumeration, then using Dictionary will probably pay off. 例如,如果每个枚举执行十亿次查找,那么使用Dictionary可能会得到回报。 If instead enumeration happens more often, a sorted array might be better in the end, because the data in the Dictionary will first have to be sorted and algorithms like QuickSort can do that in O(n * log n) time. 如果更频繁地发生枚举,最后排序的数组可能会更好,因为首先必须对Dictionary的数据进行排序,而像QuickSort这样的算法可以在O(n * log n)时间内完成。

I suggest you try both in a typical use scenario for you application and see which one is best. 我建议您在应用程序的典型使用场景中尝试两种方法,并查看哪一种最佳。

OR, If memory is not a concern, why not use a Dictionary AND a sorted array? 或者,如果内存不是问题,为什么不使用Dictionary和排序数组? If this is done properly, you could get the best of both worlds. 如果这样做得当,你可以充分利用这两个方面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM