简体   繁体   中英

How to merge multiple sequences

I have several sequences as follows

在此处输入图片说明

I keep them in a list of list of strings

List<List<String>> Sequences;

I would like to merge them so that to remove sequences that are covered by other sequences. For example the sequence V VPC VPS S is covered by the sequence V MV VPC VPC VPS VPA S because the latter contains all elements of the former and in the same order (this example is not in the above lists).

I think there should be simple solutions with Linq , however I am not mastered with it.

My approach is to iterate the sequences and for each sequence to find sequences that their intersection with it is itself and have the same order, if so then to remove it, something like

foreach (var item in Sequences)
{
    if (Sequences.Any(x => x.Intersect(item).SequenceEqual(item)))
    {
        Sequences.Remove(item);
    }
}

If order does matter:

bool IsSubsequence<T>(IEnumerable<T> subseq, IEnumerable<T> superseq)
    where T : IEquatable<T>
{
    var subit = subseq.GetEnumerator();
    if (!subit.MoveNext()) return true; // Empty subseq -> true
    foreach (var superitem in superseq)
    {
        if (superitem.Equals(subit.Current))
        {
            if (!subit.MoveNext()) return true;
        }
    }
    return false;
}

List<List<T>> PruneSequences<T>(List<List<T>> lists)
    where T : IEquatable<T>
{
    return lists
        .Where(sublist =>
            !lists.Any(superlist =>
                sublist != superlist &&
                IsSubsequence(sublist, superlist)))
        .ToList();
}

Usage:

var Sequences = new List<List<string>> {
    new List<string> { "N", "MN", "MN", "S" },
    new List<string> { "PUNC" },
    new List<string> { "N" },
    new List<string> { "V", "VPC", "VPS", "S" },
    new List<string> { "N", "NPC" },
    new List<string> { "N", "MN" },
    new List<string> { "N", "NPA" },
    new List<string> { "ADJ" },
    new List<string> { "V", "MV", "VPC", "VPC", "VPSD", "VPA", "S" },
    new List<string> { "PREP", "PPC", "PPC" },
    new List<string> { "PRONC", "NPC" },
    new List<string> { "JONJ", "CPC", "CPC", "VPC", "VPSD", "CLR" },
    new List<string> { "CONJ" },
    new List<string> { "AUX" },
    new List<string> { "V", "MV", "VPC" },
    new List<string> { "N", "NPA", "NPC", "NPC" }
};
var PrunedSequences = PruneSequences(Sequences);

Result:

N MN MN S 
PUNC 
V VPC VPS S 
ADJ 
V MV VPC VPC VPSD VPA S 
PREP PPC PPC 
PRONC NPC 
JONJ CPC CPC VPC VPSD CLR 
CONJ 
AUX 
N NPA NPC NPC 
Sequences.Where(i=>!Sequences.Any(x => ReferenceEquals(i,x) == false && x.Intersect(i).SequenceEqual(i)));

您的解决方案可能会失败,因为它会针对自身测试项目?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM