简体   繁体   中英

How to distinct a list using LINQ?

I have a class Event that have two properties : "ID", and "ExpirationTime". I have a list that have many events, some of them with the same ID. I want to create an efficient LINQ query that will distinct the events by the ID, and for each ID keep the event with the smallest ExpirationTime.

Thanks!

The grouping is easy enough, but doing an efficient "MinBy" with standard LINQ to Objects is slightly messy:

var lowestByID = items.GroupBy(x => x.ID)
                      .Select(group => group.Aggregate((best, next) =>
                                   best.ExpirationTime < next.ExpirationTime 
                                  ? best : next));

It's cleaner with a MinBy operator, such as the one provided with MoreLinq .

var lowestByID = items.GroupBy(x => x.ID)
                      .Select(group => group.MinBy(x => x.ExpirationTime));

LINQ's Distinct() on a particular property

Simple! You want to group them and pick a winner out of the group.

List<Event> distinctEvents = allEvents
   .GroupBy(e => e.Id)
   .Select(g => g.OrderBy(e => e.ExpirationTime).First())
   .ToList(); 

I believe this should outperform the GroupBy suggestion (see brief explanation below):

IEnumerable<Event> DistinctEvents(IEnumerable<Event> events)
{
    var dict = new Dictionary<int, Event>();

    foreach (Event e in events)
    {
        Event existing;
        if (!dict.TryGetValue(e.Id, out existing) || e.ExpirationTime < existing.ExpirationTime)
        {
            dict[e.Id] = e;
        }
    }

    foreach (Event e in dict.Values)
    {
        yield return e;
    }
}

Explanation : While this and the GroupBy method proposed by Ani have the same algorithmic complexity (as far as I can tell, anyway), the above approach is more efficient in practice for two reasons.

  1. GroupBy internally uses a Lookup<TKey, TValue> (very similar to a Dictionary<TKey, List<TValue>> ) which actually populates internal collections with the contents of the input sequence. This requires more memory and also has a performance impact, particularly due to the fact that while the sub-collections will have amortized O(1) insertion time, they will occasionally need to resize themselves, which will be O(N) (where N is the size of the sub-collection). This is not a big deal, but it's still a lot more work you really need to be doing.
  2. A consequence of point #1 is that this in turn requires iterating over each element in the input sequence before GroupBy can provide an enumerator (so it's deferred execution, but then the entire input sequence needs to be iterated before iterating over the result of GroupBy ). Then you're iterating over each group again in the call to Aggregate ; so in all, you're iterating over the elements in the input sequence twice , which is more times than necessary to accomplish the task at hand.

As I said, the algorithmic complexity is the same, which means the two approaches should be equally scalable; this one is simply faster. I took the liberty of testing both approaches (out of curiosity, mostly) and found the above to execute in roughly half the time and cause fewer GC collections (a rough approximation of memory usage) than the GroupBy approach.

These are minute concerns, which it would normally be a waste of time to think too much about. The only reason I mention them is that you asked for an efficient solution (and even bolded the term); so I figured you would want to take these kinds of factors into consideration.

Assuming you can implement IComparable on your Event class (since LINQ's Min doesn't have an overload returning the original item otherwise), you can do:

var distinct = events.GroupBy(evt => evt.Id).Select(grp => grp.Min());

Example:

void Main()
{
    var events = new List<Event>
    {
        new Event(1, DateTime.Now),
        new Event(1, DateTime.Now.AddDays(1)),
        new Event(2, DateTime.Now.AddDays(2)),
        new Event(2, DateTime.Now.AddDays(-22)),
    };

    var distinct = events.GroupBy(evt => evt.Id).Select(grp => grp.Min());
}

public class Event : IComparable<Event>
{
    public Event(int id, DateTime exp)
    {
        Id = id;
        Expiration = exp;
    }
    public int Id {get; set;}
    public DateTime Expiration {get; set;}

    public int CompareTo(Event other)
    {
        return Expiration.CompareTo(other.Expiration);
    }
}

I think this should do it:

events.GroupBy(x => x.ID, (key, items) => items.First(y => y.ExpirationTime == items.Min(z => z.ExpirationTime)))

Will group by ID, selecting as the result the event in items (where items represents all the events with the same ID) with the smallest ExpirationTime .

events.GroupBy(e => e.ID).Select(g => new { ID = g.Key, Time = g.Min(e => e.ExpirationTime) });
        List<Event> events = null;
        events
            .GroupBy( e => e.ID )
            .Select( g =>
                g.First( e => 
                    e.ExpirationTime == g.Max( t =>
                        t.ExpirationTime
                    )
                )
            );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM