简体   繁体   中英

remove duplication from list on field using linq

I am new in C# and Linq I am facing one problem to remove duplicate records I need to Remove duplicate records those who don't have Dept . Below is a quick example using an employee list

 private static void Main()
    {
        List<Employee> empList = new List<Employee>();

        empList.Add(new Employee() { ID = 1, Name = "John", Age=23, Dept='computer'});
        empList.Add(new Employee() { ID = 2, Name = "Mary", Age=25, Dept='computer'});
        empList.Add(new Employee() { ID = 3, Name = "Amber",Age=23, Dept=''});
        empList.Add(new Employee() { ID = 4, Name = "Kathy",Age=25, Dept=''});
        empList.Add(new Employee() { ID = 5, Name = "Lena", Age=27, Dept='computer'});
        empList.Add(new Employee() { ID = 6, Name = "John", Age=28, Dept=''});
        empList.Add(new Employee() { ID = 7, Name = "Kathy",Age=27, Dept='Tester'});
 empList.Add(new Employee() { ID = 8, Name = "John", Age=23, Dept='computer'});
        var dup = empList
            .GroupBy(x => new { x.FName })
            .Select(group => new { Name = group.Key, Count = group.Count() })
            .OrderByDescending(x => x.Count);

        foreach (var x in dup)
        {
            Response.Write(x.Count + " " + x.Name);
        }
    }
    class Employee
    {
        public int ID { get; set; }
        public string FName { get; set; }
        public int Age { get; set; }
        public char Dept { get; set; }
    }

final output look like this output example:-

    empList.Add(new Employee() { ID = 1, Name = "John", Age=23, Dept='computer'});
        empList.Add(new Employee() { ID = 2, Name = "Mary", Age=25, Dept='computer'});
        empList.Add(new Employee() { ID = 3, Name = "Amber",Age=23, Dept=''}); 
        empList.Add(new Employee() { ID = 5, Name = "Lena", Age=27, Dept='computer'});
        empList.Add(new Employee() { ID = 7, Name = "Kathy",Age=27, Dept='Tester'});
 empList.Add(new Employee() { ID = 8, Name = "John", Age=23, Dept='computer'});

I need to remove those duplicate record those who dont have dept.

condition 1 duplicate records will come multiple times but one single record which don't have dept that should have to delete. and remaining record will display on output

Since the ID is unique you could use this approach ( Dept seems to be a string):

var empDupNoDepartment = empList
    .GroupBy(x => String.IsNullOrEmpty(x.Dept) ? int.MinValue : x.ID)
    .Select(group => group.First())
    .ToList();

This keeps only the first employee with empty Dept .

from e in empList
group e by e.Name into g
select g.FirstOrDefault(e => !String.IsNullOrEmpty(e.Dept)) ?? g.First();

Just group by Name and select either first employee with Dept or just first employee.

Output:

[
  { "ID": 1, "Name": "John", "Age": 23, "Dept": "computer" },
  { "ID": 2, "Name": "Mary", "Age": 25, "Dept": "computer" },
  { "ID": 3, "Name": "Amber", "Age": 23,"Dept": "" },
  { "ID": 7, "Name": "Kathy", "Age": 27, "Dept": "Tester" },
  { "ID": 5, "Name": "Lena", "Age": 27, "Dept": "computer" }
]

If you want to keep all entries with non-empty Dept, then

Func<Employee, bool> hasDept = e => !String.IsNullOrEmpty(e.Dept);
var result = empList
       .GroupBy(e => e.Name)
       .SelectMany(g => g.Any(hasDept) ? g.Where(hasDept) : g.Take(1));

Query syntax:

from e in empList
group e by e.Name into g
from e in g.Any(hasDept) ? g.Where(hasDept) : g.Take(1)
select e;

Output:

[
  { "ID": 1, "Name": "John", "Age": 23, "Dept": "computer" },
  { "ID": 8, "Name": "John", "Age": 23, "Dept": "computer" },  <== difference
  { "ID": 2, "Name": "Mary", "Age": 25, "Dept": "computer" },
  { "ID": 3, "Name": "Amber", "Age": 23,"Dept": "" },
  { "ID": 7, "Name": "Kathy", "Age": 27, "Dept": "Tester" },
  { "ID": 5, "Name": "Lena", "Age": 27, "Dept": "computer" }
]

Create this class:

class Dept
{
    public int Count { get; set; }
    public string Name { get; set; }
    public List<Employee> Employees { get; set; }
}

And here is the query:

var dup = empList
    .GroupBy(x => new { x.Name })

    // Employees with duplicate name
    .Select(group => new { Emps = group.Select(x => x)})

    // From duplicates select only those that have a department 
    .SelectMany(x => {
        var emps = x.Emps.Where(y => !string.IsNullOrWhiteSpace(y.Dept));
        var employeesWithDept = emps.GroupBy(g => g.Name );


        IEnumerable<Dept> a = 
        employeesWithDept.Select(g => new Dept { Employees = g.ToList(), Name = g.Key.ToString(), Count = g.Count()});
        return a;
    })
    .OrderByDescending(x => x.Count);

What you have here is an example of why someone's name is a terrible primary key for anything ever.

All Values:

var hasDept = empList.Where(x=>x.Dept != null && x.Dept.Trim() != string.Empty).ToList();

Distinct Values:

var hasDept =  empList.Where(x=>x.Dept != null && x.Dept.Trim() != string.Empty).Distinct().ToList();

Those will get you the ones that have a department. If you also want to get the ones that don't have one, but don't have a duplicate entry which does have a department, the easiest way is probably:

var noDept = empList.Where(x=>x.Dept == null || x.Dept.Trim() == string.Empty).Distinct().ToList()  //gets all the ones with no dept

var all = noDept;
foreach(var e in all)
{
        if(hasDept.Where(x.Name == e.Name).Count == 0)
           all.Add(e);
}

I'm not sure about it, but if you just want to remove empty Dept employee with a Linq you should be able to do :

empList = empList.Where(Dept => !string.IsNullOrWhiteSpace(Dept)).Distinct().ToList()

Best regards !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM