简体   繁体   English

c#根据条件删除较旧的重复数据表

[英]c# Remove older duplicates DataTable based on condition

I have a DataTable with duplicated rows. 我有一个重复的行的数据表。 The rows are equal but the Date is different and there are columns where there isn't a date inserted. 行是相等的,但日期是不同的,并且在某些列中没有插入日期。 I need to remove the older duplicates and remove the duplicates where no date was inserted. 我需要删除较旧的副本,并删除没有插入日期的副本。 I have following code but it does not work because it can't convert fields into DateTime to compare. 我有以下代码,但无法正常工作,因为它无法将字段转换为DateTime进行比较。

public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
 Hashtable hTable = new Hashtable();
 ArrayList duplicateList = new ArrayList();
 //Add list of all the unique item value to hashtable, which stores 
 combination of key, value pair.
 //And add duplicate item value in arraylist.
  foreach (DataRow drow in dTable.Rows)
   {
    foreach (DataRow drow2 in dTable.Rows)
     {
      if (hTable.Contains(drow[colName]))
       {
        if (DateTime.TryParse(drow["date"].ToString(), out DateTime res))
         {
          if (Convert.ToDateTime(drow["date"])<Convert.ToDateTime(drow2["date"]))
           {
            duplicateList.Add(drow);
           }
           if (Convert.ToDateTime(drow["date"]) > Convert.ToDateTime(drow2["date"]))
           {
           duplicateList.Add(drow);
           }
         }
       }
     else
     hTable.Add(drow[colName], string.Empty);
   }
  }
  //Removing a list of duplicate items from datatable.
  foreach (DataRow dRow in duplicateList)
            dTable.Rows.Remove(dRow);
  //Datatable which contains unique records will be return as output.
  return dTable;
}

So you want to remove duplicates according to the date -column? 因此,您要根据date -column删除重复项吗? You can use LINQ: 您可以使用LINQ:

List<DataRow> duplicateRows = dTable.AsEnumerable()
    .GroupBy(r => r[colName])
    .SelectMany(g => g
        .Select(r => new
        {
            Row = r,
            Date = DateTime.TryParse(r.Field<string>("date"), out DateTime date) 
                       ? date : new DateTime?()
        })
        .OrderByDescending(x => x.Date.HasValue)
        .ThenByDescending(x => x.Date.GetValueOrDefault())
        .Skip(1))  // only the first row of the group will be retained
    .Select(x => x.Row)
    .ToList();

duplicateRows.ForEach(dTable.Rows.Remove);

So first all rows which contain a date-value, this group ordered by the date-value itself descending, so newest first, all other rows are duplicates. 因此,首先所有包含日期​​值的行,该组按日期值本身的降序排列,所以最新的第一行,所有其他行都是重复的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM