简体   繁体   中英

Using LINQ to get the results from another LINQ collection

I have a LINQ statement which pulls the top N record IDs from a collection and then another query which pulls all records which have those IDs. It feels very clunky and inefficient and i was wondering if there might be a more succinct, LINQy way to get the same results

var records = cache.Select(rec => rec.Id).Distinct().Take(n);

var results = cache.Where(rec => records.Contains(rec.Id));

FYI - there will be multiple records with the same ID, which is why there is the Distinct() and why i can't use a simple Take() in the first place.

Thanks!

How about something like this?

var results = cache.GroupBy(rec => rec.Id, rec => rec)
                   .Take(n)
                   .SelectMany(rec => rec);

The same thing you did, but in one line and with Join() instead of Contains():

var results = cache
    .Select(rec => rec.Id)
    .Distinct()
    .Take(n)
    .ToList()
    .Join(cache, rec => rec, record => record.Id, (rec, record) => record);

Yes, unfortuately LINQ doesn't natively support letting the user choose a member to get distinct records on. So I recommend creating your own extension method for it:

/// <summary>
    /// Returns a list with the ability to specify key(s) to compare uniqueness on
    /// </summary>
    /// <typeparam name="T">Source type</typeparam>
    /// <param name="source">Source</param>
    /// <param name="keyPredicate">Predicate with key(s) to perform comparison on</param>
    /// <returns></returns>
    public static IEnumerable<T> Distinct<T>(this IEnumerable<T> source,
                                             Func<T, object> keyPredicate)
    {
        return source.Distinct(new GenericComparer<T>(keyPredicate));
    }

And then create a generic comparer, which you will notice is quite generic.

   public class GenericComparer<T> : IEqualityComparer<T>
    {
        private Func<T, object> _uniqueCheckerMethod;

        public GenericComparer(Func<T, object> keyPredicate)
        {
            _uniqueCheckerMethod = keyPredicate;
        }

        #region IEqualityComparer<T> Members

        bool IEqualityComparer<T>.Equals(T x, T y)
        {
            return _uniqueCheckerMethod(x).Equals(_uniqueCheckerMethod(y));
        }

        int IEqualityComparer<T>.GetHashCode(T obj)
        {
            return _uniqueCheckerMethod(obj).GetHashCode();
        }

        #endregion
    }

Now just chain up your LINQ statement: var records = cache.Select(rec => rec.Id).Distinct().Take(n);

var results = cache.Distinct(rec => rec.Id).Take(n));

hth

The only way that I can think of doing this in SQL would be with a subquery, so probably there are going to be two LINQ queries also...
It "feels" inefficient... is it? Maybe you are worrying about something that is not worth worrying about. You can probly make it into one line by doing a join, but whether that is clearer / better / more efficient is a different question.

Edit: The extension method answer by Aaronaught can be made to work like this:

    public static IEnumerable<T> TakeByDistinctKey<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keyFunc, int numKeys) {
    if(keyFunc == null) {
        throw new ArgumentNullException("keyFunc");
    }

    List<TKey> keys = new List<TKey>();
    foreach(T item in source) {
        TKey key = keyFunc(item);
        if(keys.Contains(key)) {
            // one if the first n keys, yield
            yield return item;
        } else if(keys.Count < numKeys) {
            // new key, but still one of the first n seen, yield
            keys.Add(key);
            yield return item;
        }
        // have enough distinct keys, just keep going to return all of the items with those keys
    }
}

However, the GroupBy / SelectMany looks the neatest. I would go with that one.

There is no built-in "Linqy" way (you could group, but it would be pretty inefficient), but that doesn't mean you can't make your own way:

public static IEnumerable<T> TakeDistinctByKey<T, TKey>(
    this IEnumerable<T> source,
    Func<T, TKey> keyFunc,
    int count)
{
    if (keyFunc == null)
        throw new ArgumentNullException("keyFunc");
    if (count <= 0)
        yield break;

    int currentCount = 0;
    TKey lastKey = default(TKey);
    bool isFirst = true;
    foreach (T item in source)
    {
        yield return item;
        TKey key = keyFunc(item);
        if (!isFirst && (key != lastKey))
            currentCount++;
        if (currentCount > count)
            yield break;
        isFirst = false;
        lastKey = key;
    }
}

Then you can invoke it with this:

var items = cache.TakeDistinctByKey(rec => rec.Id, 20);

If you have composite keys or anything like that you could easily extend the method above to take an IEqualityComparer<TKey> as an argument.

Also note that this depends on the elements being in sorted order by key. If they aren't, you could either change the algorithm above to use a HashSet<TKey> instead of a straight count and last-item comparison, or invoke it with this instead:

var items = cache.OrderBy(rec => rec.Id).TakeDistinctByKey(rec => rec.Id, 20);

Edit - I'd also like to point out that in SQL I would either use a ROW_NUMBER query or a recursive CTE, depending on the performance requirement - a distinct+join is not the most efficient method. If your cache is in sorted order (or if you can change it to be in sorted order) then the method above will be by far the cheapest in terms of both memory and execution time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM