简体   繁体   English

从DataTable对象中选择不同的列组合,并以另一列为条件,并且缺少一些关键信息

[英]Selecting distinct column combinations from a DataTable object with another column as a condition, and some key information missing

This is quite related to a previous question that I asked and received a very good answer to - it's just gotten a bit more complicated now: How can I select distinct column combinations from a DataTable object with another column as a condition? 这与我之前提出的问题非常相关,我收到了一个很好的答案-现在问题变得更加复杂了: 如何从以另一列为条件的DataTable对象中选择不同的列组合?

I'm using C# 2010. 我正在使用C#2010。

I have a DataTable object I'm working with which has the following structure (and is filled with sample data): 我有一个正在使用的DataTable对象,该对象具有以下结构(并填充有示例数据):

"name"    "ID"    "hiredate"    "termdate"
Bobby     1        5/1/2011       7/1/2011
Peggy     2        5/1/2011
Lucy      4                       7/3/2012
Jenny     3        5/2/2011
Jenny     3        5/2/2013
Jenny     3        5/2/2011       6/1/2011
Peggy     2        5/1/2011
Lucy      4        6/1/2012

I want to filter this DataTable to keep only distinct ("ID","hiredate") combinations. 我想过滤此DataTable以仅保留不同的(“ ID”,“ hiredate”)组合。 There are two main features of this problem: 1 - if there are duplicate ("ID","hiredate") entries, the one with the most information (ie an existing "termdate") should be kept. 此问题有两个主要特征:1-如果存在重复的(“ ID”,“ hiredate”)条目,则应保留信息最多的条目(即现有的“ termdate”)。 2 - some entries don't have a "hiredate", and only a "termdate". 2-有些条目没有“受聘者”,只有“任期”。 They need to be matched up with the proper "hiredate" before condition 1 can be accurately handled (at least I think they do). 在条件1可以被正确处理之前,它们需要与适当的“受聘者”相匹配(至少我认为是这样)。

The data table is created from a csv and possibly added user input, not from a database query, otherwise my life would be a lot easier. 数据表是通过csv创建的,并可能添加了用户输入,而不是从数据库查询中创建,否则我的生活会轻松很多。

So the resulting table after doing this would be: 因此,执行此操作后的结果表将是:

"name"    "ID"    "hiredate"    "termdate"
Bobby     1        5/1/2011       7/1/2011
Peggy     2        5/1/2011
Jenny     3        5/2/2013
Jenny     3        5/2/2011       6/1/2011
Lucy      4        6/1/2012       7/3/2012

Jenny has two entries because she appeared with two different "hiredate" values, and one of them was also duplicated - the entry without the "termdate" was removed. 珍妮有两个条目,因为她出现了两个不同的“受雇”值,并且其中一个也被复制了-删除了没有“任期”的条目。 Lucy's two rows have been merged - they had complementary missing dates. 露西的两行已合并-他们有互补的失踪日期。

Any suggestions for how to do this in C#? 关于如何在C#中执行此操作的任何建议? Again, I'm using a DataTable object. 同样,我正在使用DataTable对象。 I still need to keep the "name" and "termdate" fields - if I didn't, then I was able to get a distinct ("ID","hiredate") list, but they really need to be retained. 我仍然需要保留“ name”和“ termdate”字段-如果没有,那么我可以获得一个不同的(“ ID”,“ hiredate”)列表,但确实需要保留它们。

In my original question, there were not any entries that had a "termdate" but no "hiredate", and this is was the accepted solution, which worked fine for me: 在我最初的问题中,没有任何条目带有“ termdate”,但没有“ hiredate”,这是公认的解决方案,对我来说很好用:

            DataView dv = new DataView(dt);
            dv.Sort = "ID ASC, HireDate DESC, TermDate DESC";

            string lastID = "0";
            List<DateTime> addedHireDatesForUser = new List<DateTime>();

            foreach (DataRowView drv in dv)
            {
                if (drv["ID"].ToString() != lastID)
                {
                    addedHireDatesForUser = new List<DateTime>();
                    addedHireDatesForUser.Add(DateTime.Parse(drv["HireDate"].ToString()));

                    // NEXT ID, ADD ROW TO NEW DATATABLE
                }
                else if (!addedHireDatesForUser.Contains(DateTime.Parse(drv["HireDate"].ToString())))
                {
                    addedHireDatesForUser.Add(DateTime.Parse(drv["HireDate"].ToString());

                    // NEXT DATE, ADD ROW TO NEW DATATABLE
                }

                lastID = drv["ID"].ToString();
            }

What I'm looking for is help with an (at least somewhat) elegant way to also deal with the entries missing "hiredate" as part of this process. 我正在寻找的是(至少在某种程度上)优雅的方法的帮助,该方法还可以处理此过程中缺少“已租用”的条目。 I could write a really inefficient loop to match up all of them, but as there are (in reality) thousands of entries in this table, I have to wonder if there is a better way. 我可以编写一个效率很低的循环来匹配所有这些循环,但是由于(实际上)该表中有成千上万个条目,所以我想知道是否有更好的方法。

I appreciate any suggestions! 我感谢任何建议!

Does this have a SQL query attached to it? 是否附有SQL查询? If so and the query is something like 如果是这样,查询就像

SELECT name, ID, hiredate, termdate from table

It could be switched to 可以切换到

--First query returns combined record where they have a null in hiredate and one in termdate
SELECT t1.name, t1.ID, max(t2.hiredate) as hiredate, max(t1.termdate) as termdate from table t1
inner join table t2 on t1.id = t2.id and t1.hiredate is null and t2.hiredate is null
GROUP by t1.name, t1.ID
UNION
--Second query returns full records where both hiredate and termdate are there
SELECT name, ID, hiredate, termdate from table t1
where t1.hiredate is not null and termdate is not null
UNION
--Third query returns all records with a different hiredate that have no termdate and include them
SELECT name, ID, hiredate, termdate from table t1
LEFT OUTER JOIN t2 on t1.ID = t2.ID and t1.hiredate = t2.hiredate
where t1.termdate is null and t2.hiredate is null

This should cover all the situations you discussed. 这应该涵盖您讨论的所有情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM