简体   繁体   中英

Selecting distinct column combinations from a DataTable object with another column as a condition, and some key information missing

This is quite related to a previous question that I asked and received a very good answer to - it's just gotten a bit more complicated now: How can I select distinct column combinations from a DataTable object with another column as a condition?

I'm using C# 2010.

I have a DataTable object I'm working with which has the following structure (and is filled with sample data):

"name"    "ID"    "hiredate"    "termdate"
Bobby     1        5/1/2011       7/1/2011
Peggy     2        5/1/2011
Lucy      4                       7/3/2012
Jenny     3        5/2/2011
Jenny     3        5/2/2013
Jenny     3        5/2/2011       6/1/2011
Peggy     2        5/1/2011
Lucy      4        6/1/2012

I want to filter this DataTable to keep only distinct ("ID","hiredate") combinations. There are two main features of this problem: 1 - if there are duplicate ("ID","hiredate") entries, the one with the most information (ie an existing "termdate") should be kept. 2 - some entries don't have a "hiredate", and only a "termdate". They need to be matched up with the proper "hiredate" before condition 1 can be accurately handled (at least I think they do).

The data table is created from a csv and possibly added user input, not from a database query, otherwise my life would be a lot easier.

So the resulting table after doing this would be:

"name"    "ID"    "hiredate"    "termdate"
Bobby     1        5/1/2011       7/1/2011
Peggy     2        5/1/2011
Jenny     3        5/2/2013
Jenny     3        5/2/2011       6/1/2011
Lucy      4        6/1/2012       7/3/2012

Jenny has two entries because she appeared with two different "hiredate" values, and one of them was also duplicated - the entry without the "termdate" was removed. Lucy's two rows have been merged - they had complementary missing dates.

Any suggestions for how to do this in C#? Again, I'm using a DataTable object. I still need to keep the "name" and "termdate" fields - if I didn't, then I was able to get a distinct ("ID","hiredate") list, but they really need to be retained.

In my original question, there were not any entries that had a "termdate" but no "hiredate", and this is was the accepted solution, which worked fine for me:

            DataView dv = new DataView(dt);
            dv.Sort = "ID ASC, HireDate DESC, TermDate DESC";

            string lastID = "0";
            List<DateTime> addedHireDatesForUser = new List<DateTime>();

            foreach (DataRowView drv in dv)
            {
                if (drv["ID"].ToString() != lastID)
                {
                    addedHireDatesForUser = new List<DateTime>();
                    addedHireDatesForUser.Add(DateTime.Parse(drv["HireDate"].ToString()));

                    // NEXT ID, ADD ROW TO NEW DATATABLE
                }
                else if (!addedHireDatesForUser.Contains(DateTime.Parse(drv["HireDate"].ToString())))
                {
                    addedHireDatesForUser.Add(DateTime.Parse(drv["HireDate"].ToString());

                    // NEXT DATE, ADD ROW TO NEW DATATABLE
                }

                lastID = drv["ID"].ToString();
            }

What I'm looking for is help with an (at least somewhat) elegant way to also deal with the entries missing "hiredate" as part of this process. I could write a really inefficient loop to match up all of them, but as there are (in reality) thousands of entries in this table, I have to wonder if there is a better way.

I appreciate any suggestions!

Does this have a SQL query attached to it? If so and the query is something like

SELECT name, ID, hiredate, termdate from table

It could be switched to

--First query returns combined record where they have a null in hiredate and one in termdate
SELECT t1.name, t1.ID, max(t2.hiredate) as hiredate, max(t1.termdate) as termdate from table t1
inner join table t2 on t1.id = t2.id and t1.hiredate is null and t2.hiredate is null
GROUP by t1.name, t1.ID
UNION
--Second query returns full records where both hiredate and termdate are there
SELECT name, ID, hiredate, termdate from table t1
where t1.hiredate is not null and termdate is not null
UNION
--Third query returns all records with a different hiredate that have no termdate and include them
SELECT name, ID, hiredate, termdate from table t1
LEFT OUTER JOIN t2 on t1.ID = t2.ID and t1.hiredate = t2.hiredate
where t1.termdate is null and t2.hiredate is null

This should cover all the situations you discussed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM