简体   繁体   中英

Linq get values not shared across multiple lists

What's the most efficient way to write a method that will compare n lists and return all the values that do not appear in all lists, so that

var lists = new List<List<int>> {
                                  new List<int> { 1, 2, 3, 4 },
                                  new List<int> { 2, 3, 4, 5, 8 },
                                  new List<int> { 2, 3, 4, 5, 9, 9 },
                                  new List<int> { 2, 3, 3, 4, 9, 10 }
                                };


public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
  //...fast algorithm here
}

so that

lists.GetNonShared();

returns 1, 5, 8, 9, 10

I had

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
  return list.SelectMany(item => item)
             .Except(lists.Aggregate((a, b) => a.Intersect(b));
}

But I wasn't sure if that was efficient. Order does not matter. Thanks!

        public static IEnumerable<T> GetNonShared<T>(this IEnumerable<IEnumerable<T>> list)
        {
           return list.SelectMany(x => x.Distinct()).GroupBy(x => x).Where(g => g.Count() < list.Count()).Select(group => group.Key);
        }

EDIT: I think I'd think of it like this...

You want the union of all the lists, minus the intersection of all the lists. That's effectively what your original does, leaving Except to do the "set" operation of Union despite getting duplicate inputs. In this case I suspect you could do this more efficiently just building up two HashSet s and doing all the work in-place:

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{        
    using (var iterator = lists.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            return new T[0]; // Empty
        }

        HashSet<T> union = new HashSet<T>(iterator.Current.ToList());
        HashSet<T> intersection = new HashSet<T>(union);
        while (iterator.MoveNext())
        {
            // This avoids iterating over it twice; it may not be necessary,
            // it depends on how you use it.
            List<T> list = iterator.Current.Toist();
            union.UnionWith(list);
            intersection = intersection.IntersectWith(list);
        }
        union.ExceptWith(intersection);
        return union;
    }
}

Note that this is now eager, not deferred.


Here's an alternative option:

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
    return list.SelectMany(list => list)
               .GroupBy(x => x)
               .Where(group => group.Count() < lists.Count)
               .Select(group => group.Key);
}

If it's possible for a list to contain the same item more than once, you'd want a Distinct call in there:

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
    return list.SelectMany(list => list.Distinct())
               .GroupBy(x => x)
               .Where(group => group.Count() < list.Count)
               .Select(group => group.Key);
}

EDIT: Now I've corrected this, I understand your original code... and I suspect I can find something better... thinking...

public static IEnumerable<T> GetNonShared<T>(this IEnumerable<IEnumerable<T>> list)
{
    var lstCnt=list.Count(); //get the total number if items in the list                                
    return list.SelectMany (l => l.Distinct())
        .GroupBy (l => l)
        .Select (l => new{n=l.Key, c=l.Count()})
        .Where (l => l.c<lstCnt)
        .Select (l => l.n)
        .OrderBy (l => l) //can be commented
        ;
}

//use HashSet and SymmetricExceptWith for .net >= 4.5

I think you need to create an intermediate step, which is finding all the items which are common to all lists. This is easy to do with set logic - it's just the set of items in the first list intersected with the set of items in each succeeding list. I don't think that step's doable in LINQ, though.

class Program
{
    static void Main(string[] args)
    {
        IEnumerable<IEnumerable<int>> lists = new List<IEnumerable<int>> {
                              new List<int> { 1, 2, 3, 4 },
                              new List<int> { 2, 3, 4, 5, 8 },
                              new List<int> { 2, 3, 4, 5, 9, 9 },
                              new List<int> { 2, 3, 3, 4, 9, 10 }
                            };

        Console.WriteLine(string.Join(", ", GetNonShared(lists)
            .Distinct()
            .OrderBy(x => x)
            .Select(x => x.ToString())
            .ToArray()));
        Console.ReadKey();
    }

    public static HashSet<T> GetShared<T>(IEnumerable<IEnumerable<T>> lists)
    {
        HashSet<T> result = null;
        foreach (IEnumerable<T> list in lists)
        {
            result = (result == null)
                         ? new HashSet<T>(list)
                         : new HashSet<T>(result.Intersect(list));
        }
        return result;
    }

    public static IEnumerable<T> GetNonShared<T>(IEnumerable<IEnumerable<T>> lists)
    {
        HashSet<T> shared = GetShared(lists);
        return lists.SelectMany(x => x).Where(x => !shared.Contains(x));
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM