简体   繁体   中英

Perform Join in linq

Perform a join in linQ ? var a = new List{ 1, 2, 3, 1 }; var duplicated = a.GroupBy(x=> x).Any(b => b.Count > 1);

Here is how I understand you issue. You have a class (call it table ) that has a property List<string> with column labels called X , and a List<List<object>> property called Y that represents the data rows.

class table {
    public List<string> X { get; set; } // headers
    public List<List<object>> Y { get; set; } // rows
}

You have two table s (call them t1 and t2 ) and you want to left join them on one or more columns and create a new table object containing the data from the join. The left hand side will be duplicated for multiple matches on the right hand side. The right hand side for unmatched rows will be a List<object> of null s. The result will be filtered to only have desired columns on the right hand side.

(My sample t1 table is like yours, my sample t2 table adds a new column "f" that is an integer to match up with t1 column "c".)

Create an empty right hand side for when there is no matching rhs:

var emptyT2 = Enumerable.Repeat((object)null, t2.X.Count).ToList();

Create an IEqualityComparer<object> to compare the join columns:

var jec = new IEnumerableSequenceEqualityComparer<object>();

Find the column indices of the columns you want to join on:

var t1JoinCols = new List<string> { "a", "c" };
var t1JoinIndices = t1JoinCols.Select(c => t1.X.IndexOf(c)).ToList();
var t2JoinCols = new List<string> { "c", "f" };
var t2JoinIndices = t2JoinCols.Select(c => t2.X.IndexOf(c)).ToList();

Create a filter for the right hand side columns you want in the output:

var t2wanted = new List<string> { "d", "e", "f" };
var t2wantedIndices = t2.X.Select((x, n) => (x, n)).Where(xn => 

t2wanted.Contains(xn.x)).Select(xn => xn.n).ToHashSet();

Create an intermediate query to do the left join and pull the matching data (switch to fluent syntax to pass the IEqualityComparer<object> ):

var t3r = t1.Y.GroupJoin(t2.Y,
                         t1r => t1JoinIndices.Select(n1 => t1r[n1]),
                         t2r => t2JoinIndices.Select(n2 => t2r[n2]),
                         (t1r, t2rg) => (t1r,t2rg),
                         jec)
              .SelectMany(t1rt2rg => t1rt2rg.t2rg.DefaultIfEmpty()
                                                 .Select(t2r => t1rt2rg.t1r.Concat(Enumerable.Range(0, t2.X.Count)
                                                                                             .Where(n => t2wantedIndices.Contains(n))
                                                                                             .Select(n => t2r?[n]))
                                                                            .ToList()));

Convert the intermediate query into a table object by building the new headers and the List<List<object>> for the rows:

var ans = new table {
    X = t1.X.Concat(Enumerable.Range(0, t2.X.Count)
                              .Where(n => t2wantedIndices.Contains(n))
                              .Select(n => t2.X[n]))
            .ToList(),
    Y = t3r.ToList()
};

NOTE: the GroupJoin operation in the t3r query creates a Lookup for the second (in this case t2.Y ) argument which uses hashcode based O(1) lookup, so there is not much efficiency to be gained by not using LINQ. If you are doing millions of rows in each object, you may want to consider re-writing the whole thing in procedural code to gain a few milliseconds per row.

Here is the IEnumerableSequenceEqualityComparer definition:

public class IEnumerableSequenceEqualityComparer<T> : IEqualityComparer<IEnumerable<T>> {
    public bool Equals(IEnumerable<T> x, IEnumerable<T> y) =>
        Object.ReferenceEquals(x, y) || (x != null && y != null && (x.SequenceEqual(y)));

    public int GetHashCode(IEnumerable<T> obj) {
        // Will not throw an OverflowException
        unchecked {
            return obj.Where(e => e != null).Select(e => e.GetHashCode()).Aggregate(17, (a, b) => 23 * a + b);
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM