简体   繁体   English

更改词典的最快方法是什么 <K,V> ?

[英]What is the fastest way of changing Dictionary<K,V>?

This is an algorithmic question. 这是一个算法问题。

I have got Dictionary<object,Queue<object>> . 我有Dictionary<object,Queue<object>> Each queue contains one or more elements in it. 每个队列中都包含一个或多个元素。 I want to remove all queues with only one element from the dictionary. 我想从字典中删除只有一个元素的所有队列。 What is the fastest way to do it? 最快的方法是什么?

Pseudo-code: foreach(item in dict) if(item.Length==1) dict.Remove(item); 伪代码: foreach(item in dict) if(item.Length==1) dict.Remove(item);

It is easy to do it in a loop (not foreach, of course), but I'd like to know which approach is the fastest one here. 循环很容易做到这一点(当然不是foreach),但是我想知道哪种方法最快。

Why I want it: I use that dictionary to find duplicate elements in a large set of objects. 为什么要使用它:我使用该词典在大量对象中查找重复的元素。 The Key in dictionary is kind of a hash of the object, the Value is a queue of all objects found with the same hash. 字典中的键是对象的哈希,值是所有具有相同哈希的对象的队列。 Since I want only duplicates, I need to remove all items with just a single object in associated queue. 由于只需要重复项,因此需要删除关联队列中只有一个对象的所有项。

Update: 更新:

It may be important to know that in a regular case there are just a few duplicates in a large set of objects. 重要的是要知道,在正常情况下,大型对象集中只有少量重复项。 Let's assume 1% or less. 假设小于等于1%。 So possibly it could be faster to leave the Dictionary as is and create a new one from scatch with just selected elements from the first one... and then deelte the first Dictionary completely. 因此,将词典保留原样并从scatch中创建一个新词典(仅从第一个词典中选择元素)可能会更快,然后完全删除第一个词典。 I think it depends on the comlpexity of computational Dictionary class's methods used in particular algorithms. 我认为这取决于特定算法中使用的计算词典类方法的复杂性。

I really want to see this problem on a theoretical level because as a teacher I want to discuss it with students. 我真的很想在理论上看这个问题,因为作为一名老师,我想与学生讨论这个问题。 I didn't provide any concrete solution myself because I think it is really easy to do it. 我自己没有提供任何具体的解决方案,因为我认为这样做确实很容易。 The question is which approach is the best, the fastest. 问题是哪种方法最好,最快。

var itemsWithOneEntry = dict.Where(x => x.Value.Count == 1)
                            .Select(x => x.Key)
                            .ToList();

foreach (var item in itemsWithOneEntry) {
    dict.Remove(item));
}

It stead of trying to optimize the traversing of the collection how about optimizing the content of the collection so that it only includes the duplicates? 它不是试图优化集合的遍历,而是如何优化集合的内容,使其仅包含重复项? This would require changing your collection algorithm instead to something like this 这将需要将您的收集算法改为类似这样的内容

var duplicates = new Dictionary<object,Queue<object>>;
var possibleDuplicates = new Dictionary<object,object>();
foreach(var item in original){
    if(possibleDuplicates.ContainsKey(item)){
       duplicates.Add(item, new Queue<object>{possibleDuplicates[item],item});
       possibleDuplicates.Remove(item);
    } else if(duplicates.ContainsKey(item)){
       duplicates[item].Add(item);
    } else {
       possibleDuplicates.Add(item);
    }
}

Note that you should probably measure the impact of this on the performance in a realistic scenario before you bother to make your code any more complex than it really needs to be. 请注意,您可能应该在现实情况下测量此操作对性能的影响,然后再让代码变得比实际需要复杂得多。 Most imagined performance problems are not in fact the real cause of slow code. 实际上,大多数想象中的性能问题并不是导致代码缓慢的真正原因。

But supposing you do find that you could get a speed advantage by avoiding a linear search for queues of length 1, you could solve this problem with a technique called indexing . 但是假设您确实发现可以通过避免线性搜索长度为1的队列来获得速度优势,则可以使用称为indexing的技术来解决此问题。

As well as your dictionary containing all the queues, you maintain an index container (probably another dictionary) that only contains the queues of length 1, so when you need them they are already available separately. 除了包含所有队列的字典之外,您还维护一个仅包含长度为1的队列的索引容器(可能是另一个字典),因此当您需要它们时,它们已经可以单独使用。

To do this, you need to enhance all the operations that modify the length of the queue, so that they have the side-effect of updating the index container. 为此,您需要增强所有修改队列长度的操作,以使它们具有更新索引容器的副作用。

One way to do it is to define a class ObservableQueue . 一种方法是定义一个ObservableQueue类。 This would be a thin wrapper around Queue except it also has a ContentsChanged event that fires when the number of items in the queue changes. 这将是Queue一个薄包装,除了它还有一个ContentsChanged事件,该事件在队列中的项目数更改时将触发。 Use ObservableQueue everywhere instead of the plain Queue . 到处使用ObservableQueue而不是普通Queue

Then when you create a new queue, enlist on its ContentsChanged event a handler that checks to see if the queue only has one item. 然后,当您创建一个新队列时,请在其ContentsChanged事件上注册一个处理程序,该处理程序检查该队列是否只有一项。 Based on this you can either insert or remove it from the index container. 基于此,您可以将其插入或从索引容器中删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM