根据列值从DataTable中删除重复项的最佳方法

Question

I have a DataSet which contains just one Table , so you could say I'm working with a DataTable here. 我有一个仅包含一个Table的DataSet ，所以您可以说我在这里使用DataTable。

The code you see below works, but I want to have the best and most efficient way to perform the task because I work with some data here. 您在下面看到的代码可以工作，但是我想以最好，最有效的方式执行任务，因为我在这里处理了一些数据。

Basically, the data from the Table should later be in a Database, where the primary key - of course - must be unique. 基本上，表中的数据以后应该在数据库中，而主键（当然）必须是唯一的。

The primary key of the data I work with is in a column called Computer Name . 我使用的数据的主键位于名为Computer Name的列中。 For each entry we also have a date in another column date . 对于每个条目，我们在另一个列date都有一个date 。

I wrote a function which searches for duplicates in the Computer Name column, and then compare the dates of these duplicates to delete all but the newest. 我编写了一个函数，该函数在“ Computer Name列中搜索重复项，然后比较这些重复项的日期以删除除最新项以外的所有项。

The Function I wrote looks like this: 我写的函数看起来像这样：

private void mergeduplicate(DataSet importedData)
{
    Dictionary<String, List<DataRow>> systems = new Dictionary<String, List<DataRow>>();
    DataSet importedDataCopy = importedData.Copy();
    importedData.Tables[0].Clear();
    foreach (DataRow dr in importedDataCopy.Tables[0].Rows)
    {
        String systemName = dr["Computer Name"].ToString();
        if (!systems.ContainsKey(systemName)) 
        {
            systems.Add(systemName, new List<DataRow>());
        }
        systems[systemName].Add(dr);
    }


    foreach (KeyValuePair<String,List<DataRow>> entry in systems) {
        if (entry.Value.Count > 1) {
            int firstDataRowIndex = 0;
            int secondDataRowIndex = 1;
            while (entry.Value.Count > 1) {
                DateTime time1 = Validation.ConvertStringIntoDateTime(entry.Value[firstDataRowIndex]["date"].ToString());
                DateTime time2 = Validation.ConvertStringIntoDateTime(entry.Value[secondDataRowIndex]["date"].ToString());

                //delete older entry
                if (DateTime.Compare(time1,time2) >= 0) {
                    entry.Value.RemoveAt(firstDataRowIndex);
                } else {
                    entry.Value.RemoveAt(secondDataRowIndex);
                }
            }
        }
        importedData.Tables[0].ImportRow(entry.Value[0]);
    }
}

My Question is, since this code works - what is the best and fastest/most efficient way to perform the task? 我的问题是，既然此代码有效，那么执行任务的最佳，最快/最有效的方法是什么？

I appreciate any answers! 我感谢任何答案！

Answer 1

I think this can be done more efficiently. 我认为这可以更有效地完成。 You copy the DataSet once with DataSet importedDataCopy = importedData.Copy(); 使用DataSet importedDataCopy = importedData.Copy();复制一次DataSet importedDataCopy = importedData.Copy(); and then you copy it again into a dictionary and then you delete the unnecessary data from the dictionary. 然后将其再次复制到词典中，然后从词典中删除不必要的数据。 I would rather just remove the unnecessary information in one pass. 我宁愿一次性删除不必要的信息。 What about something like this: 像这样的事情呢：

private void mergeduplicate(DataSet importedData)
{
    Dictionary<String, DataRow> systems = new Dictionary<String, DataRow>();
    int i = 0;

    while (i < importedData.Tables[0].Rows.Count)
    {
        DataRow dr = importedData.Tables[0].Rows[i];
        String systemName = dr["Computer Name"].ToString();
        if (!systems.ContainsKey(systemName)) 
        {
            systems.Add(systemName, dr);
        }
        else
        {
            // Existing date is the date in the dictionary.
            DateTime existing = Validation.ConvertStringIntoDateTime(systems[systemName]["date"].ToString());

            // Candidate date is the date of the current DataRow.
            DateTime candidate = Validation.ConvertStringIntoDateTime(dr["date"].ToString());

            // If the candidate date is greater than the existing date then replace the existing DataRow
            // with the candidate DataRow and delete the existing DataRow from the table.
            if (DateTime.Compare(existing, candidate) < 0) 
            {
                importedData.Tables[0].Rows.Remove(systems[systemName]);
                systems[systemName] = dr;
            }
            else
            {
                importedData.Tables[0].Rows.Remove(dr);
            }
        }
        i++;
    }
}

Answer 2

maybe not the most efficient way but you said you appreciate any answers 也许不是最有效的方法，但是您说您很感激任何答案

List<DataRow> toDelete =  dt.Rows.Cast<DataRow>()
                                .GroupBy(s => s["Computer Name"])
                                .SelectMany(grp => grp.OrderBy(x => x["date"])
                                .Skip(1)).ToList();
toDelete.ForEach(x => dt.Rows.Remove(x));

Answer 3

You could try to use CopyToDataTable , like this: 您可以尝试使用CopyToDataTable ，如下所示：

importedData.Tables[0] = importedData.Tables[0].AsEnumerable()
       .GroupBy(r => new {CN = r["Computer Name"], Date = r["date"]})
       .Select(g => g.OrderBy(r => r["Date"]).(First())
       .CopyToDataTable();

根据列值从DataTable中删除重复项的最佳方法

问题描述

3 个解决方案

解决方案1
2 已采纳 2015-06-10 14:23:55

解决方案2
0 2015-06-10 14:22:55

解决方案3
0 2015-06-10 14:24:31

根据列值从DataTable中删除重复项的最佳方法

问题描述

3 个解决方案

解决方案1 2 已采纳 2015-06-10 14:23:55

解决方案2 0 2015-06-10 14:22:55

解决方案3 0 2015-06-10 14:24:31

解决方案1
2 已采纳 2015-06-10 14:23:55

解决方案2
0 2015-06-10 14:22:55

解决方案3
0 2015-06-10 14:24:31