简体   繁体   English

查找列表中唯一成员数量的最快方法

[英]Fastest Approach to Finding Number of Unique Members in a List

I've been trying to find a good way of looking for the number of unique values from a list. 我一直在尝试寻找一种从列表中查找唯一值数量的好方法。 There was a very good question here which I tried to peruse to create a solution that looks like this: 有一个很好的问题, 在这里 ,我想细读创建一个解决方案,看起来像这样:

gridStats[0] = gridList.SelectMany(x => x.Position.Easting).Distinct().ToList().Count();
gridStats[1] = gridList.SelectMany(x => x.Position.Northing).Distinct().ToList().Count();

However, that seems to produce an error saying that I am implicitly declaring the type arguments that didn't make sense. 但是,这似乎产生了一个错误,表明我在隐式声明没有意义的类型参数。 Further research seemed to suggest that 'Distinct', good as it is, would not actually provide what I am looking for in any case without some additional code. 进一步的研究似乎表明,“ Distinct”尽管本身很好,但是在没有任何其他代码的情况下,实际上并不能提供我想要的东西。

Therefore, I gave up on that approach and tried to go for a loop method, and I have arrived at this: 因此,我放弃了这种方法,尝试使用循环方法,而我得出了以下结论:

List<double> eastings = new List<double>();
List<double> northings = new List<double>();

for (int i = 0; i < gridList.Count; i++)
{
    if (!eastings.Contains(gridList[i].Position.Easting))
    {
        eastings.Add(gridList[i].Position.Easting);
    }

    if (!northings.Contains(gridList[i].Position.Northing))
    {
        northings.Add(gridList[i].Position.Northing);
    }
}

gridStats[0] = eastings.Count;
gridStats[1] = northings.Count;

Note here that 'gridList' can have hundreds of millions of entries. 请注意,“ gridList”可以包含数亿个条目。

Quite predictably, this loop is not particularly fast in use. 可以预见,此循环使用起来不是特别快。 Therefore, I was hoping it would be possible to either get assistance in making that loop more efficient or assistance in sorting out the Linq approach. 因此,我希望有可能获得帮助以使该循环更有效,或者获得帮助以解决Linq方法。

What do you suggest as the best approach when the only concern is the speed at which this task is performed? 当您仅关注此任务的执行速度时,您建议最好的方法是什么?

You were so close. 你好亲近

Distinct is indeed the best choice for this scenario - it's similar to HashSet<T> based implementation, but uses internally a special lightweight hash set implementation. 在这种情况下, Distinct确实是最好的选择-与基于HashSet<T>的实现类似,但在内部使用特殊的轻量级哈希集实现。 In practice I don't think there will be a noticeable difference in performance, but still Distinct is more readable and at the same time a bit faster. 在实践中,我认为性能不会有明显的差异,但是Distinct仍然可读性更高,同时速度更快。

What you've missed though is that the question in the link is about list of objects having a list property so it needed SelectMany , while in your case the objects hold a single property , so a simple Select will do the job, like this 不过,您所缺少的是,链接中的问题是关于具有list属性的对象列表,因此它需要SelectMany ,而在您的情况下,这些对象仅具有一个属性 ,因此简单的Select可以完成此任务

gridStats[0] = gridList.Select(x => x.Position.Easting).Distinct().Count();
gridStats[1] = gridList.Select(x => x.Position.Northing).Distinct().Count();

Also note that ToList call was not needed in order to use Count extension method. 另请注意,使用Count扩展方法不需要ToList调用。 Every operation has a cost, so don't include unnecessary methods - they'll not make your code more readable, but for sure will make it slower and more space consuming. 每个操作都有成本,所以不要包括不必要的方法-它们不会使您的代码更具可读性,但是可以肯定会使其变慢并且占用更多空间。

You can speed this up by using HashSet instead of List for eastings and northings : 您可以通过使用加速此HashSet的 ,而不是Listeastingsnorthings

HashSet<double> eastings = new HashSet<double>();
HashSet<double> northings = new HashSet<double>();

The reason this would be faster is because a HashSet uses a hash to give O(1) look ups, versus using List which will be O(n) (it has to search the whole list to see if the item exists). 之所以会更快,是因为HashSet使用散列来进行O(1)查找,而不是使用List来进行O(n) (它必须搜索整个列表以查看该项是否存在)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM