简体   繁体   中英

LINQ two datatables join

I am stuck with this problem. I am able to solve this problem by using foreach but there has to be a better solution for this.

I have a two datatables. First has a column named "Por" Second has two column named "FirstPor" and "LastPor". My goals is to use LINQ for creating a new datatable depending on condition like this.

foreach ( DataRow row in FirstDatatable.Rows )
{
    foreach ( DataRow secondRow in SecondDatatable.Rows )
    {
       if ( row["Por"] >= secondRow["FirstPor"] && row["Por"] <= secondRow["LastPor"] )
            FinalDataTable.Rows.Add(row);
    }
}

I am new in LINQ and this could be problem for me. I am able to do that via Parallel.Foreach but I think that LINQ could be much faster. The condition is simple. Get each number from first table ("Por" column) and check in which row it belongs from second table ( "Por" >= "FirstPor" && "Por" <= "LastPor" ). I think it is simple for anybody who's working this every day. Yep, there is another task. The columns are STRING type, so conversion is needed in LINQ statement.


Yes, I have just modified my Parallel code to hybrid LINQ/Parallel and seems I am done. I used what James and Rahul wrote and put that to my code. Now, the process takes 52 seconds to estimate 421 000 rows :) It's much better.

public class Range
{
        public int FirstPor { get; set; }
        public int LastPor { get; set; }
        public int PackageId { get; set; }
}

var ranges = (from r in tmpDataTable.AsEnumerable()
          select new Range
                          {
                              FirstPor = Int32.Parse(r["FirstPor"] as string),
                              LastPor = Int32.Parse(r["LastPor"] as string),
                              PackageId = Int32.Parse(r["PackageId"] as string)
                          }).ToList();

Parallel.ForEach<DataRow>(dt.AsEnumerable(), row => 
{

int por = Int32.Parse(row["Por"].ToString());

                lock (locker)
                {
                    row["PackageId"] = ranges.First(range => por >= range.FirstPor && por <= range.LastPor).PackageId;
                }

                worker.ReportProgress(por);

            });

(Untested code ahead.....)

var newRows = FirstDatatable.Rows
             .Where(row =>SecondDatatable.Rows
                 .Any(secondRow => row["Por"] >= secondRow["FirstPor"] 
                                && row["Por"] <= secondRow["LastPor"]);

FinalDataTable.Rows.AddRange(newRows);

However, if speed is your real concern, my first suggestion is to dump the datatables, and use a list. I'm gonna gues that SecondDatatable, is largely fixed, and probabaly changes less than once a day. So, less create a nice in-memory structure for that:

class Range
{
     public int FirstPor {get; set;}
     public int LastPor {get; set;}
 }

 var ranges = (from r in SecondDatatable.Rows
               select new Range
                 { 
                     FirstPor = Int32.Parse(r["FirstPor"]),
                     LastPor = Int32.Parse(r["LastPor"])
                  }).ToList();

Then our code becomes:

var newRows = FirstDatatable.Rows
         .Where(row =>ranges
             .Any(range => row["Por"] >= range.FirstPor
                            && row["Por"] <= range.LastPor).ToList();

Which by itself should make this considerably faster.

Now, on a success, it will scan the Ranges up until it finds one that matches. On a failure, it will have to scan the whole list before it gives up. So, the first thing we need to do to speed this up, is to sort the list of ranges. Then we only have to scan up to the point where the low end of the range it higher than the value we are looking for. That should cut the processing time for those rows outside any range in half.

Try this:-

DataTable FinalDataTable = (from x in dt1.AsEnumerable()
                           from y in dt2.AsEnumerable()
                           where x.Field<int>("Por") >= y.Field<int>("FirstPor")
                                && x.Field<int>("Por") <= y.Field<int>("LastPor")
                           select x).CopyToDataTable();

Here is the complete Working Fiddle , (I have tested with some sample data with your existing code and my LINQ code) you can copy paste the same in your editor and test because DotNet Fiddle is not supporting AsEnumerable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM