简体   繁体   English

检查具有空值的列表以查找C#中的重复项

[英]Checking a list with null values for duplicates in C#

In C#, I can use something like: 在C#中,我可以使用如下代码:

List<string> myList = new List<string>();

if (myList.Count != myList.Distinct().Count())
{
    // there are duplicates
}

to check for duplicate elements in a list. 检查列表中的重复元素。 However, when there are null items in list this produces a false positive. 但是,当列表中null项目时,将产生误报。 I can do this using some sluggish code but is there a way to check for duplicates in a list while disregarding null values with a concise way ? 我可以使用一些缓慢的代码来做到这一点,但是有没有一种方法可以检查列表中的重复项,同时又可以用一种简洁的方式忽略空值呢?

If you're worried about performance, the following code will stop as soon as it finds the first duplicate item - all the other solutions so far require the whole input to be iterated at least once. 如果您担心性能,下面的代码将在找到第一个重复项后立即停止-到目前为止,所有其他解决方案都需要对整个输入进行至少一次迭代。

var hashset = new HashSet<string>();
if (myList.Where(s => s != null).Any(s => !hashset.Add(s)))
{
    // there are duplicates
}

hashset.Add returns false if the item already exists in the set, and Any returns true as soon as the first true value occurs, so this will only search the input as far as the first duplicate. hashset.Add如果该项目已存在于集合中,则返回false ,并且Any在第一个true值出现后立即返回true ,因此这只会搜索输入项,直到第一个重复项为止。

I'd do this differently: 我会做不同的事情:

Given Linq statements will be evaluated lazily, the .Any will short-circuit - meaning you don't have to iterate & count the entire list, if there are duplicates - and as such, should be more efficient. 给定Linq语句将被懒惰地求值, .Any会短路-意味着如果有重复,您不必迭代并计算整个列表-因此,这样应该更有效。

var dupes = myList
    .Where(item => item != null)
    .GroupBy(item => item)
    .Any(g => g.Count() > 1);

if(dupes)
{
    //there are duplicates
}

EDIT: http://pastebin.com/b9reVaJu Some Linqpad benchmarking that seems to conclude GroupBy with Count() is faster 编辑: http ://pastebin.com/b9reVaJu一些Linqpad基准测试似乎可以断定使用Count() GroupBy更快

EDIT 2: Rawling's answer below seems at least 5x faster than this approach! 编辑2:Rawling在下面的回答似乎至少比这种方法快5倍!

var nonNulls = myList.Where(x => x != null)
if (nonNulls.Count() != nonNulls.Distinct().Count())
{
    // there are duplicates
}

Well, two nulls are duplicates, aren't they? 好吧,两个null是重复的,不是吗?

Anyway, compare the list without nulls: 无论如何,比较不包含空值的列表:

var denullified = myList.Where(l => l != null);
if(denullified.Count() != denullified.Distinct().Count()) ...

EDIT my first attempt sucks because it is not deferred. 编辑我的第一次尝试很烂,因为它没有被推迟。

instead, 代替,

var duplicates = myList
    .Where(item => item != null)
    .GroupBy(item => item)
    .Any(g => g.Skip(1).Any());

poorer implementation deleted. 较差的实现已删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM