比较这两个 collections 的最快方法是什么？

Question

I am noticing a huge performance issue with trying to get a list of keys in a ConcurrentDictionary value object that exist in an IEnumerable collection as follows:我注意到在尝试获取 IEnumerable 集合中存在的 ConcurrentDictionary 值 object 中的键列表时存在巨大的性能问题，如下所示：

Customer object has: string CustomerNumber;客户 object 有：字符串 CustomerNumber； string Location;字符串位置；

var CustomerDict = ConcurrentDictionary<string, Customer>();
var customers = IEnumerable<string>();

I am trying to get a list of the keys in the dictionary where the customers.CustomerNumber is in the dictionary.我正在尝试获取customers.CustomerNumber在字典中的字典中的键列表。 What I have is below the removeItems takes a very long time to return:我在 removeItems 下面需要很长时间才能返回：

var removeItems = CustomerDict
    .Where(w => customers.Any(c => c == w.Value.CustomerNumber))
    .Select(s => s.Key)
    .ToList();

foreach(var item in removeItems)
{
   CustomerDict.TryRemove(item, out _);
}

Any help would be much appreciated what best to do with this.任何帮助将不胜感激。

Answer 1

Make customers a HashSet<string> , who's Contains method is O(1) :让customers成为HashSet<string> ，其Contains方法为O(1) ：

var customers = HashSet<string>();

var removeItems = CustomerDict
    .Where(w => customers.Contains(w.Value.CustomerNumber))
    .Select(s => s.Key);

Currently, Any is iterating over customers every time which has an O(n) complexity.目前， Any每次都在迭代customers ，其复杂度为O(n) 。

Also you're call to ToList is superfluous: it adds an additional, unnecessary iteration over customers , not to mention increased memory usage.此外，您对ToList的调用是多余的：它在customers上增加了额外的、不必要的迭代，更不用说增加了 memory 的使用。

Answer 2

I think its better to create HashSet from customers in order to look faster,我认为最好从customers那里创建HashSet以便看起来更快，

HashSet<string> customersHashSet = new HashSet<string>(customers);

var removeItems = CustomerDict
                    .Where(c => customersHashSet.Contains(c.Value.CustomerNumber))
                    .Select(s => s.Key);

foreach (var item in removeItems)
{
    CustomerDict.TryRemove(item, out _);
}

When removing consider if you have many items in the HashSet ( relatively to the dictionary ) its maybe better to iterate over the dictionary and search in the HashSet, like this:删除时，请考虑 HashSet 中是否有很多项目（相对于字典），最好遍历字典并在 HashSet 中搜索，如下所示：

foreach (var item in CustomerDict.ToArray())
{
    if (customersHashSet.Contains(item.Value.CustomerNumber))
        CustomerDict.TryRemove(item.Key, out _);
}

Answer 3

The problem is that .Any will do a linear scan of the underlying collection, which in your case is the key collection of your concurrent dictionary.问题是.Any将对底层集合进行线性扫描，在您的情况下，它是并发字典的键集合。 This takes linear effort.这需要线性努力。 It would be better to dump the keys into a local HashSet and then check the inclusion via .Contains(w.Value.CustomerNumber) .最好将密钥转储到本地 HashSet 中，然后通过.Contains(w.Value.CustomerNumber)检查包含情况。 This becomes nearly constant effort.这几乎变成了持续的努力。

Answer 4

Why not just simply do this:为什么不简单地这样做：

foreach(var customer in customers) //enumerate customers
   CustomerDict.TryRemove(customer, out _); //trytoremove the customer, won't do anything if the customer isn't found

比较这两个 collections 的最快方法是什么？

问题描述

4 个解决方案

解决方案1
3 2020-08-18 12:28:52

解决方案2
2 2020-08-18 12:28:45

解决方案3
1 2020-08-18 12:23:23

解决方案4
0 2020-08-18 13:04:26

比较这两个 collections 的最快方法是什么？

问题描述

4 个解决方案

解决方案1 3 2020-08-18 12:28:52

解决方案2 2 2020-08-18 12:28:45

解决方案3 1 2020-08-18 12:23:23

解决方案4 0 2020-08-18 13:04:26

解决方案1
3 2020-08-18 12:28:52

解决方案2
2 2020-08-18 12:28:45

解决方案3
1 2020-08-18 12:23:23

解决方案4
0 2020-08-18 13:04:26