remove duplication from list on field using linq

Question

I am new in C# and Linq I am facing one problem to remove duplicate records I need to Remove duplicate records those who don't have Dept . Below is a quick example using an employee list

 private static void Main()
    {
        List<Employee> empList = new List<Employee>();

        empList.Add(new Employee() { ID = 1, Name = "John", Age=23, Dept='computer'});
        empList.Add(new Employee() { ID = 2, Name = "Mary", Age=25, Dept='computer'});
        empList.Add(new Employee() { ID = 3, Name = "Amber",Age=23, Dept=''});
        empList.Add(new Employee() { ID = 4, Name = "Kathy",Age=25, Dept=''});
        empList.Add(new Employee() { ID = 5, Name = "Lena", Age=27, Dept='computer'});
        empList.Add(new Employee() { ID = 6, Name = "John", Age=28, Dept=''});
        empList.Add(new Employee() { ID = 7, Name = "Kathy",Age=27, Dept='Tester'});
 empList.Add(new Employee() { ID = 8, Name = "John", Age=23, Dept='computer'});
        var dup = empList
            .GroupBy(x => new { x.FName })
            .Select(group => new { Name = group.Key, Count = group.Count() })
            .OrderByDescending(x => x.Count);

        foreach (var x in dup)
        {
            Response.Write(x.Count + " " + x.Name);
        }
    }
    class Employee
    {
        public int ID { get; set; }
        public string FName { get; set; }
        public int Age { get; set; }
        public char Dept { get; set; }
    }

final output look like this output example:-

    empList.Add(new Employee() { ID = 1, Name = "John", Age=23, Dept='computer'});
        empList.Add(new Employee() { ID = 2, Name = "Mary", Age=25, Dept='computer'});
        empList.Add(new Employee() { ID = 3, Name = "Amber",Age=23, Dept=''}); 
        empList.Add(new Employee() { ID = 5, Name = "Lena", Age=27, Dept='computer'});
        empList.Add(new Employee() { ID = 7, Name = "Kathy",Age=27, Dept='Tester'});
 empList.Add(new Employee() { ID = 8, Name = "John", Age=23, Dept='computer'});

I need to remove those duplicate record those who dont have dept.

condition 1 duplicate records will come multiple times but one single record which don't have dept that should have to delete. and remaining record will display on output

Answer 1

Since the ID is unique you could use this approach ( Dept seems to be a string):

var empDupNoDepartment = empList
    .GroupBy(x => String.IsNullOrEmpty(x.Dept) ? int.MinValue : x.ID)
    .Select(group => group.First())
    .ToList();

This keeps only the first employee with empty Dept .

Answer 2

from e in empList
group e by e.Name into g
select g.FirstOrDefault(e => !String.IsNullOrEmpty(e.Dept)) ?? g.First();

Just group by Name and select either first employee with Dept or just first employee.

Output:

[
  { "ID": 1, "Name": "John", "Age": 23, "Dept": "computer" },
  { "ID": 2, "Name": "Mary", "Age": 25, "Dept": "computer" },
  { "ID": 3, "Name": "Amber", "Age": 23,"Dept": "" },
  { "ID": 7, "Name": "Kathy", "Age": 27, "Dept": "Tester" },
  { "ID": 5, "Name": "Lena", "Age": 27, "Dept": "computer" }
]

If you want to keep all entries with non-empty Dept, then

Func<Employee, bool> hasDept = e => !String.IsNullOrEmpty(e.Dept);
var result = empList
       .GroupBy(e => e.Name)
       .SelectMany(g => g.Any(hasDept) ? g.Where(hasDept) : g.Take(1));

Query syntax:

from e in empList
group e by e.Name into g
from e in g.Any(hasDept) ? g.Where(hasDept) : g.Take(1)
select e;

Output:

[
  { "ID": 1, "Name": "John", "Age": 23, "Dept": "computer" },
  { "ID": 8, "Name": "John", "Age": 23, "Dept": "computer" },  <== difference
  { "ID": 2, "Name": "Mary", "Age": 25, "Dept": "computer" },
  { "ID": 3, "Name": "Amber", "Age": 23,"Dept": "" },
  { "ID": 7, "Name": "Kathy", "Age": 27, "Dept": "Tester" },
  { "ID": 5, "Name": "Lena", "Age": 27, "Dept": "computer" }
]

Answer 3

Create this class:

class Dept
{
    public int Count { get; set; }
    public string Name { get; set; }
    public List<Employee> Employees { get; set; }
}

And here is the query:

var dup = empList
    .GroupBy(x => new { x.Name })

    // Employees with duplicate name
    .Select(group => new { Emps = group.Select(x => x)})

    // From duplicates select only those that have a department 
    .SelectMany(x => {
        var emps = x.Emps.Where(y => !string.IsNullOrWhiteSpace(y.Dept));
        var employeesWithDept = emps.GroupBy(g => g.Name );


        IEnumerable<Dept> a = 
        employeesWithDept.Select(g => new Dept { Employees = g.ToList(), Name = g.Key.ToString(), Count = g.Count()});
        return a;
    })
    .OrderByDescending(x => x.Count);

Answer 4

What you have here is an example of why someone's name is a terrible primary key for anything ever.

All Values:

var hasDept = empList.Where(x=>x.Dept != null && x.Dept.Trim() != string.Empty).ToList();

Distinct Values:

var hasDept =  empList.Where(x=>x.Dept != null && x.Dept.Trim() != string.Empty).Distinct().ToList();

Those will get you the ones that have a department. If you also want to get the ones that don't have one, but don't have a duplicate entry which does have a department, the easiest way is probably:

var noDept = empList.Where(x=>x.Dept == null || x.Dept.Trim() == string.Empty).Distinct().ToList()  //gets all the ones with no dept

var all = noDept;
foreach(var e in all)
{
        if(hasDept.Where(x.Name == e.Name).Count == 0)
           all.Add(e);
}

Answer 5

I'm not sure about it, but if you just want to remove empty Dept employee with a Linq you should be able to do :

empList = empList.Where(Dept => !string.IsNullOrWhiteSpace(Dept)).Distinct().ToList()

Best regards !

remove duplication from list on field using linq

Question

5 answers

solution1
1 ACCPTED 2017-01-06 14:00:50

solution2
0 2017-01-06 14:13:30

solution3
0 2017-01-06 14:58:34

solution4
-1 2017-01-06 14:06:31

solution5
-2 2017-01-06 14:01:27

remove duplication from list on field using linq

Question

5 answers

solution1 1 ACCPTED 2017-01-06 14:00:50

solution2 0 2017-01-06 14:13:30

solution3 0 2017-01-06 14:58:34

solution4 -1 2017-01-06 14:06:31

solution5 -2 2017-01-06 14:01:27

solution1
1 ACCPTED 2017-01-06 14:00:50

solution2
0 2017-01-06 14:13:30

solution3
0 2017-01-06 14:58:34

solution4
-1 2017-01-06 14:06:31

solution5
-2 2017-01-06 14:01:27