简体   繁体   English

没有LINQ的C#完全外部联接两个数据表

[英]C# Full Outer Join Two DataTables Without LINQ

Thanks for reading this. 感谢您阅读本文。

Here's my goal: I have two Datatables I've read in from Excel Worksheets. 我的目标是:我已经从Excel工作表中读取了两个数据表。 The Datatables have the same schema (Columns A, B, C, ... in Datatable1 has same kind of data as Column A, B, C, ... in Datatable2). 数据表具有相同的架构(Datatable1中的A,B,C,...列与Datatable2中的A,B,C,...列具有相同类型的数据)。

I need to compare data in the tables by arbitrary columns (ie for comparison only Column A and C matter, but I need to keep data in Column A, B, C, ..., N). 我需要按任意列比较表中的数据(即,仅比较A和C列,但我需要将数据保留在A,B,C,...,N列中)。

Since I'm reading these in from Excel Worksheets, the schema can't be expected. 由于我是从Excel工作表中读取这些内容的,因此无法预期该架构。 For example, if I load a different set of Worksheets, the comparison columns may differ. 例如,如果我加载一组不同的工作表,则比较列可能会有所不同。 For this reason, I can't use LINQ, which is like a hard coded SQL statement. 因此,我不能使用LINQ,就像硬编码的SQL语句一样。

I need to perform the equivalent of a FULL OUTER JOIN. 我需要执行FULL OUTER JOIN的等效操作。 I'm trying to show all data, including missing data from either datatable which doesn't appear in the other datatable. 我试图显示所有数据,包括其中一个数据表中缺少的数据,而该数据表未出现在另一个数据表中。

I've read a little bit about DataRelations, but I'm not sure how to use them. 我已经阅读了一些有关DataRelations的内容,但是我不确定如何使用它们。

Please provide example code. 请提供示例代码。

Thanks in advance! 提前致谢!

Given a pair of DataTable s with an arbitrary number of columns, and given a function that can create a grouping value of a reasonable type from each of the two DataTable s, you can use Linq to do most of the work for you. 给定一对具有任意列数的DataTable ,并提供一个可以从两个DataTable的每一个创建合理类型的分组值的函数,您可以使用Linq来完成大部分工作。

Let's start with a function to extract the join key from the DataTable s. 让我们从一个从DataTable提取联接键的函数开始。 It'd be nice to just return an object[] , but they don't compare well. 只返回一个object[]会很好,但是它们的比较效果不好。 We can do it with a Tuple<object, object> however - those work nicely for this purpose. 我们可以使用Tuple<object, object>来做到这一点-那些很好地用于此目的。 And if you need more columns you can just add more columns :P 如果您需要更多列,则可以添加更多列:P

// Produces a JoinKey as Tuple containing columns 'A' and 'C' (0 and 2)
public Tuple<object, object> GetJoinKey(DataRow row)
{
    return Tuple.Create(row[0], row[2]);
}

Now the join. 现在加入。 We can't do a full outer join directly, but we can do an outer join both ways and Union the results: 我们不能直接进行完整的外部联接,但是我们可以同时进行外部联接并Union结果:

// given DataTables table1 & table2:
var outerjoin = 
    (
        from row1 in table1.AsEnumerable()
        join row2 in table2.AsEnumerable() 
            on GetJoinKey(row1) equals GetJoinKey(row2)
            into matches
        from row2 in matches.DefaultIfEmpty()
        select new { key = GetJoinKey(row1), row1, row2 }
    ).Union(
        from row2 in table2.AsEnumerable()
        join row1 in table1.AsEnumerable()
            on GetJoinKey(row2) equals GetJoinKey(row1)
            into matches
        from row1 in matches.DefaultIfEmpty()
        select new { key = GetJoinKey(row2), row1, row2 }
    );

Next up you have to create a suitable output format - a DataTable that has all the rows from both sources, plus a field to hold some info about the key: 接下来,您必须创建合适的输出格式-包含来自两个来源的所有行的DataTable ,以及一个用于保存有关键的某些信息的字段:

DataTable result = new DataTable();
// add column for string value of key:
result.Columns.Add("__key", typeof(string));
// add columns from table1:
foreach (var col in table1.Columns.OfType<DataColumn>())
    result.Columns.Add("T1_" + col.ColumnName, col.DataType);
// add columns from table2:
foreach (var col in table2.Columns.OfType<DataColumn>())
    result.Columns.Add("T2_" + col.ColumnName, col.DataType);

And finally, fill the table from the query: 最后,从查询中填写表格:

var row1def = new object[table1.Columns.Count];
var row2def = new object[table2.Columns.Count];
foreach (var src in outerjoin)
{
    // extract values from each row where present
    var data1 = (src.row1 == null ? row1def : src.row1.ItemArray);
    var data2 = (src.row2 == null ? row2def : src.row2.ItemArray);

    // create row with key string and row values
    result.Rows.Add(new object[] { src.key.ToString() }.Concat(data1).Concat(data2).ToArray());
}

Of course we could short out a couple of those operations to get a single Linq query that does 99% of the work for us. 当然,我们可以缩短其中的几个操作来获得一个Linq查询,该查询可以为我们完成99%的工作。 I'll leave that to you to play with if it sounds like fun. 如果听起来很有趣,我将留给您一起玩。

Here's the full method, done as an extension with a generic function for the join key generator, making it reasonably generic: 这是完整的方法,它是对连接密钥生成器进行通用功能扩展的扩展,使其合理地通用:

public static DataTable FullOuterJoin<T>(this DataTable table1, DataTable table2, Func<DataRow, T> keygen)
{
    // perform inital outer join operation
    var outerjoin = 
        (
            from row1 in table1.AsEnumerable()
            join row2 in table2.AsEnumerable() 
                on keygen(row1) equals keygen(row2)
                into matches
            from row2 in matches.DefaultIfEmpty()
            select new { key = keygen(row1), row1, row2 }
        ).Union(
            from row2 in table2.AsEnumerable()
            join row1 in table1.AsEnumerable()
                on keygen(row2) equals keygen(row1)
                into matches
            from row1 in matches.DefaultIfEmpty()
            select new { key = keygen(row2), row1, row2 }
        );

    // Create result table
    DataTable result = new DataTable();
    result.Columns.Add("__key", typeof(string));
    foreach (var col in table1.Columns.OfType<DataColumn>())
        result.Columns.Add("T1_" + col.ColumnName, col.DataType);
    foreach (var col in table2.Columns.OfType<DataColumn>())
        result.Columns.Add("T2_" + col.ColumnName, col.DataType);

    // fill table from join query
    var row1def = new object[table1.Columns.Count];
    var row2def = new object[table2.Columns.Count];
    foreach (var src in outerjoin)
    {
        // extract values from each row where present
        var data1 = (src.row1 == null ? row1def : src.row1.ItemArray);
        var data2 = (src.row2 == null ? row2def : src.row2.ItemArray);

        // create row with key string and row values
        result.Rows.Add(new object[] { src.key.ToString() }.Concat(data1).Concat(data2).ToArray());
    }

    return result;
}

Now, IF the tables have the same schema (which the above doesn't care about), you can do almost exactly the same thing - modify the result table generation to just clone one of the tables, then add some merge logic in the load loop. 现在,如果这些表具有相同的架构(上述内容无关紧要),则可以执行几乎完全相同的操作-修改结果表的生成以仅克隆其中一个表,然后在加载中添加一些合并逻辑环。

Here's a Gist of the above with testing and verification that it's doing what it says. 这是上述内容的要点 ,它正在测试和验证它是否按照要求进行操作。 Drop that in your compiler and see what you get out. 将其放到编译器中,看看会得到什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM