简体   繁体   English

LINQ两个数据表联接

[英]LINQ two datatables join

I am stuck with this problem. 我陷入这个问题。 I am able to solve this problem by using foreach but there has to be a better solution for this. 我可以通过使用foreach来解决此问题,但必须为此提供更好的解决方案。

I have a two datatables. 我有两个数据表。 First has a column named "Por" Second has two column named "FirstPor" and "LastPor". 第一个具有名为“ Por”的列,第二个具有名为“ FirstPor”和“ LastPor”的列。 My goals is to use LINQ for creating a new datatable depending on condition like this. 我的目标是根据这样的条件使用LINQ创建新的数据表。

foreach ( DataRow row in FirstDatatable.Rows )
{
    foreach ( DataRow secondRow in SecondDatatable.Rows )
    {
       if ( row["Por"] >= secondRow["FirstPor"] && row["Por"] <= secondRow["LastPor"] )
            FinalDataTable.Rows.Add(row);
    }
}

I am new in LINQ and this could be problem for me. 我是LINQ的新手,这对我来说可能是个问题。 I am able to do that via Parallel.Foreach but I think that LINQ could be much faster. 我可以通过Parallel.Foreach做到这一点,但我认为LINQ可能会更快。 The condition is simple. 条件很简单。 Get each number from first table ("Por" column) and check in which row it belongs from second table ( "Por" >= "FirstPor" && "Por" <= "LastPor" ). 从第一个表中获取每个数字(“ Por”列),并从第二个表中检查它属于哪一行(“ Por”> =“ FirstPor” &&“ Por” <=“ LastPor”)。 I think it is simple for anybody who's working this every day. 我认为对于每天从事这项工作的人来说,这都很简单。 Yep, there is another task. 是的,还有另一项任务。 The columns are STRING type, so conversion is needed in LINQ statement. 这些列为STRING类型,因此需要在LINQ语句中进行转换。


Yes, I have just modified my Parallel code to hybrid LINQ/Parallel and seems I am done. 是的,我刚刚将我的Parallel代码修改为LINQ / Parallel混合,看来我完成了。 I used what James and Rahul wrote and put that to my code. 我使用了James和Rahul编写的内容,并将其放入我的代码中。 Now, the process takes 52 seconds to estimate 421 000 rows :) It's much better. 现在,该过程需要52秒才能估计出421 000行:)更好。

public class Range
{
        public int FirstPor { get; set; }
        public int LastPor { get; set; }
        public int PackageId { get; set; }
}

var ranges = (from r in tmpDataTable.AsEnumerable()
          select new Range
                          {
                              FirstPor = Int32.Parse(r["FirstPor"] as string),
                              LastPor = Int32.Parse(r["LastPor"] as string),
                              PackageId = Int32.Parse(r["PackageId"] as string)
                          }).ToList();

Parallel.ForEach<DataRow>(dt.AsEnumerable(), row => 
{

int por = Int32.Parse(row["Por"].ToString());

                lock (locker)
                {
                    row["PackageId"] = ranges.First(range => por >= range.FirstPor && por <= range.LastPor).PackageId;
                }

                worker.ReportProgress(por);

            });

(Untested code ahead.....) (前面未经测试的代码.....)

var newRows = FirstDatatable.Rows
             .Where(row =>SecondDatatable.Rows
                 .Any(secondRow => row["Por"] >= secondRow["FirstPor"] 
                                && row["Por"] <= secondRow["LastPor"]);

FinalDataTable.Rows.AddRange(newRows);

However, if speed is your real concern, my first suggestion is to dump the datatables, and use a list. 但是,如果速度是您真正关心的问题,我的第一个建议是转储数据表并使用列表。 I'm gonna gues that SecondDatatable, is largely fixed, and probabaly changes less than once a day. 我要猜测的是SecondDatatable在很大程度上是固定的,而且概率每天变化不到一次。 So, less create a nice in-memory structure for that: 因此,较少为此创建一个不错的内存结构:

class Range
{
     public int FirstPor {get; set;}
     public int LastPor {get; set;}
 }

 var ranges = (from r in SecondDatatable.Rows
               select new Range
                 { 
                     FirstPor = Int32.Parse(r["FirstPor"]),
                     LastPor = Int32.Parse(r["LastPor"])
                  }).ToList();

Then our code becomes: 然后我们的代码变为:

var newRows = FirstDatatable.Rows
         .Where(row =>ranges
             .Any(range => row["Por"] >= range.FirstPor
                            && row["Por"] <= range.LastPor).ToList();

Which by itself should make this considerably faster. 靠它本身应该可以使速度更快。

Now, on a success, it will scan the Ranges up until it finds one that matches. 现在,成功完成后,它将向上扫描范围,直到找到匹配的范围。 On a failure, it will have to scan the whole list before it gives up. 如果发生故障,它必须在放弃之前先扫描整个列表。 So, the first thing we need to do to speed this up, is to sort the list of ranges. 因此,我们要做的第一件事就是对范围列表进行排序。 Then we only have to scan up to the point where the low end of the range it higher than the value we are looking for. 然后,我们只需扫描到范围的低端,使其高于我们所寻找的值。 That should cut the processing time for those rows outside any range in half. 这样可以将那些超出任何范围的行的处理时间减少一半。

Try this:- 尝试这个:-

DataTable FinalDataTable = (from x in dt1.AsEnumerable()
                           from y in dt2.AsEnumerable()
                           where x.Field<int>("Por") >= y.Field<int>("FirstPor")
                                && x.Field<int>("Por") <= y.Field<int>("LastPor")
                           select x).CopyToDataTable();

Here is the complete Working Fiddle , (I have tested with some sample data with your existing code and my LINQ code) you can copy paste the same in your editor and test because DotNet Fiddle is not supporting AsEnumerable. 这是完整的Working Fiddle ,(我已经用您的现有代码和LINQ代码对一些示例数据进行了测试),因为DotNet Fiddle不支持AsEnumerable,因此您可以将其复制粘贴到编辑器中并进行测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM