简体   繁体   English

检查数组是否重复,仅返回出现多次的项目

[英]Check array for duplicates, return only items which appear more than once

I have an text document of emails such as 我有一封电子邮件的文本文档,例如

Google12@gmail.com,
MyUSERNAME@me.com,
ME@you.com,
ratonabat@co.co,
iamcool@asd.com,
ratonabat@co.co,

I need to check said document for duplicates and create a unique array from that (so if "ratonabat@co.co" appears 500 times in the new array he'll only appear once.) 我需要检查所述文档是否重复,并从中创建一个唯一的数组(因此,如果“ ratonabat@co.co”在新数组中出现500次,他只会出现一次。)

Edit: For an example: 编辑:例如:

username1@hotmail.com
username2@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1@hotmail.com
username1@hotmail.com

This is my "data" (either in an array or text document, I can handle that) 这是我的“数据”(在数组或文本文档中,我都可以处理)

I want to be able to see if there's a duplicate in that, and move the duplicate ONCE to another array. 我希望能够查看其中是否有重复项,并将重复的ONCE移至另一个数组。 So the output would be 所以输出将是

username1@hotmail.com

You can simply use Linq's Distinct extension method: 您可以简单地使用Linq的Distinct扩展方法:

var input = new string[] { ... };
var output = input.Distinct().ToArray();

You may also want to consider refactoring your code to use a HashSet<string> instead of a simple array, as it will gracefully handle duplicates. 您可能还需要考虑将代码重构为使用HashSet<string>而不是简单的数组,因为它将优雅地处理重复项。


To get an array containing only those records which are duplicates, it's a little moe complex, but you can still do it with a little Linq: 要获得仅包含重复记录的数组,这有点复杂,但是您仍然可以使用一点Linq来完成:

var output = input.GroupBy(x => x)
                  .Where(g => g.Skip(1).Any())
                  .Select(g => g.Key)
                  .ToArray();

Explanation: 说明:

  • .GroupBy group identical strings together .GroupBy相同的字符串分组在一起
  • .Where filter the groups by the following criteria .Where按以下条件过滤组
    • .Skip(1).Any() return true if there are 2 or more items in the group. 如果组中有2个或更多项,则.Skip(1).Any()返回true。 This is equivalent to .Count() > 1 , but it's slightly more efficient because it stops counting after it finds a second item. 这等效于.Count() > 1 ,但效率更高一些,因为它在找到第二个项目后停止计数。
  • .Select return a set consisting only of a single string (rather than the group) .Select返回仅由单个字符串(而不是组)组成的集合
  • .ToArray convert the result set to an array. .ToArray将结果集转换为数组。

Here's another solution using a custom extension method : 这是使用自定义扩展方法的另一种解决方案:

public static class MyExtensions
{
    public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
    {
        var a = new HashSet<T>();
        var b = new HashSet<T>();
        foreach(var x in input)
        {
            if (!a.Add(x) && b.Add(x))
                yield return x;
        }
    }
}

And then you can call this method like this: 然后您可以像下面这样调用此方法:

var output = input.Duplicates().ToArray();

I haven't benchmarked this, but it should be more efficient than the previous method. 我没有对此进行基准测试,但是它应该比以前的方法更有效。

You can use the built in in .Distinct() method, by default the comparisons are case sensitive, if you want to make it case insenstive use the overload that takes a comparer in and use a case insensitive string comparer. 您可以使用内置的.Distinct()方法,默​​认情况下,比较是区分大小写的,如果要使其不区分大小写,请使用带比较器的重载,并使用不区分大小写的字符串比较器。

List<string> emailAddresses = GetListOfEmailAddresses();
string[] uniqueEmailAddresses = emailAddresses.Distinct(StringComparer.OrdinalIgnoreCase).ToArray();

EDIT: Now I see after you made your clarification you only want to list the duplicates. 编辑:现在我看到您做出澄清后,您只想列出重复项。

string[] duplicateAddresses = emailAddresses.GroupBy(address => address,
                                                    (key, rows) => new {Key = key, Count = rows.Count()}, 
                                                    StringComparer.OrdinalIgnoreCase)
                                            .Where(row => row.Count > 1)
                                            .Select(row => row.Key)
                                            .ToArray();

To select emails which occur more then once.. 选择出现一次以上的电子邮件。

var dupEmails=from emails in File.ReadAllText(path).Split(',').GroupBy(x=>x)
              where emails.Count()>1
              select emails.Key;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM