Using Linq to get the last N number of rows that have duplicated values in a field

Question

Given a database table, a column name C , and a number N larger than 1, how can I get a group of rows with equal values of column C which has at least N rows? If there exists more than one such group, I need to get the group which contains the newest entry (the one with the largest Id).

Is it possible to do this using LINQ to Entities?

Example:

> Id | Mycolumn
> - - - - - - -  
> 1 | name55555
> 2 | name22
> 3 | name22
> 4 | name22
> 5 | name55555
> 6 | name55555
> 7 | name1

Primary Key: ID
OrderBy: ID
Repeated column: Mycolumn

If N = 3 and C = Mycolumn , then we need to get rows which have the column MyColumn duplicated at least 3 times.

For the example above, it should return rows 1, 5 and 6, because last index of name55555 is 6 , and last index of name22 (which is also repeated 3 times) is 4 .

Answer 1

data.Mytable
    .OrderByDescending(m => m.Id)
    .GroupBy(m => m.Mycolumn)
    .FirstOrDefault(group => group.Count() >= N)
    .Take(N)
    .Select(m => m.Id)

Answer 2

If the rows are identical (all columns) then frankly there's no point fetching more than one of each - they will be indistinguishable; I don't know about LINQ, but you can do something like:

select id, name /* more cols */, count(1) from @foo
group by id, name /* more cols */ having count(1) > 1

You can probably do that in link using GroupBy etc. If they aren't entirely identical (for example, the IDENTITY is different, but the other columns are the same), it gets more difficult, and certainly there is no easy LINQ syntax for it; at the TSQL level, though:

select id, name /* more cols */
from (
select id, name /* more cols */,
    ROW_NUMBER() over (partition by name /* more cols */ order by id) as [_row] 
from @foo) x where x._row > 1

Answer 3

I have scratched this together in Linqpad, which should give you the wanted results:

int Border = 3;
var table = new List<table> 
{
  new table {Id = 1, Value = "Name1"},
  new table {Id = 2, Value = "Name2"},
  new table {Id = 3, Value = "Name5"},
  new table {Id = 4, Value = "Name5"},
  new table {Id = 5, Value = "Name2"},
  new table {Id = 6, Value = "Name5"},
  new table {Id = 7, Value = "Name5"},
};

var results = from p in table
              group p.Id by p.Value into g
              where g.Count() > Border
              select new {rows = g.ToList()};
//only in LP
results.Dump();

this yields the rows 3, 4, 6, 7.

However: You only want the last occurence, not all, so you have to query results again:

results.Skip(Math.Max(0, results.Count() - 1)).Take(1);

Kind regards

Using Linq to get the last N number of rows that have duplicated values in a field

Question

3 answers

solution1
2 2013-11-21 12:20:41

solution2
1 2013-11-21 10:39:01

solution3
1 2013-11-21 12:27:19

Using Linq to get the last N number of rows that have duplicated values in a field

Question

3 answers

solution1 2 2013-11-21 12:20:41

solution2 1 2013-11-21 10:39:01

solution3 1 2013-11-21 12:27:19

solution1
2 2013-11-21 12:20:41

solution2
1 2013-11-21 10:39:01

solution3
1 2013-11-21 12:27:19