简体   繁体   English

具有重复的两个列表的交集

[英]Intersection of two lists with repetitions

I'm trying to create a function that will give me the intersection of two lists, taking into account that there can be repeated items and I need them in the output. 考虑到可能存在重复项,并且在输出中需要它们,因此我试图创建一个使两个列表相交的函数。

Console.Write((new[] {1, 2, 2, 3}).Intersect(new[] {1, 2, 2}));

only outputs {1, 2} and what I need the output to be is {1, 2, 2}. 仅输出{1,2},我需要的输出是{1,2,2}。

Here is the method I have created: 这是我创建的方法:

private static IEnumerable<int> IntersectWithRepetitons(List<int> a, List<int> b)
{
    if (!a.Any() || !b.Any()) return Enumerable.Empty<int>();
    if (a.Count() > b.Count()) return IntersectWithRepetitons(b, a);

    var idx = b.IndexOf(a.First());
    if (idx < 0) return IntersectWithRepetitons(b, a.Skip(1).ToList());

    var tmp1 = new List<int> { a.First() };
    var tmp2 = new List<int>(b);
    tmp2.RemoveAt(idx);
    return tmp1.Concat(IntersectWithRepetitons(tmp2, a.Skip(1).ToList()));
}

I'm sure this can be highly optimized but, my main concern (efficiency wise) is that in order to keep the input lists intact, I have to duplicate the 'b' list when I remove a found item from it: 我确信这可以高度优化,但是,我主要关心的是(为了提高效率),为了保持输入列表完整无缺,当我从列表中删除找到的项目时,我必须复制“ b”列表:

var tmp2 = new List<int>(b);
tmp2.RemoveAt(idx);

and that will happen for every recursive call. 每次递归调用都会发生这种情况。 Any thoughts or ideas will be very appreciated. 任何想法或想法将不胜感激。 Thanks. 谢谢。

Map one of the sequences to a dictionary of items to the count of times they appear, then for each item in the other sequence, if it's in the collection, and the value of the lookup is greater than zero, yield it and decriment: 将其中一个序列映射到项的字典,以显示它们出现的次数,然后对另一个序列中的每个项(如果它在集合中)并且查找的值大于零,则将其屈服并递减:

public static IEnumerable<T> IntersectWithRepetitons<T>(this IEnumerable<T> first,
    IEnumerable<T> second)
{
    var lookup = second.GroupBy(x => x)
        .ToDictionary(group => group.Key, group => group.Count());
    foreach (var item in first)
        if (lookup.ContainsKey(item) && lookup[item] > 0)
        {
            yield return item;
            lookup[item]--;
        }
}

This ensures that items are yields for each time they are duplicated in both sets. 这样可以确保每次在两个集合中重复项目时,它们都是收益。

You could use TryGetValue to remove a few dictionary lookups, but it sacrifices a lot of the method's elegance, so I just didn't have it in me to do that. 您可以使用TryGetValue删除一些字典查找,但是它牺牲了该方法的许多优点,因此我没有这么做。 If you care about performance, it's not a bad thing to change. 如果您关心性能,那么更改并不是一件坏事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM