简体   繁体   中英

Longest Common Subsequence for Multiple Sequences

I have done a bunch of research for finding the longest for M = 2 sequences, but I am trying to figure out how to do it for M ≥ 2 sequences

I am being given N and M: M sequences, with N unique elements. N is the set of {1 - N}. I have thought about the dynamic programming approach, but I am still confused as to how to actually incorporate it.

Example input

5 3
5 3 4 1 2
2 5 4 3 1
5 2 3 1 4

The max sequence here can be seen to be

5 3 1

Expected output

Length = 3

A simple idea.

For each number i between 1 and N , calculate the longest subsequence where the last number is i . (Let's call it a[i] )

To do that, we'll iterate over numbers i in the first sequence from start to end. If a[i] > 1 , then there's number j such that in each sequence it comes before i .
Now we can just check all possible values of j and (if previous condition holds) do a[i] = max(a[i], a[j] + 1) .

As the last bit, because j comes before i in first sequence, it means a[j] is already calculated.

for each i in first_sequence
    // for the OP's example, 'i' would take values [5, 3, 4, 1, 2], in this order
    a[i] = 1;
    for each j in 1..N
        if j is before i in each sequence
            a[i] = max(a[i], a[j] + 1)
        end
    end
end

It's O(N^2*M) , if you calculate matrix of positions beforehand.

Since you have unique elements, @Nikita Rybak's answer is the one to go with, but since you mentioned dynamic programming, here's how you'd use DP when you have more than two sequences:

dp[i, j, k] = length of longest common subsequence considering the prefixes
              a[1..i], b[1..j], c[1..k].


dp[i, j, k] = 1 + dp[i - 1, j - 1, k - 1] if a[i] = b[j] = c[k]
            = max(dp[i - 1, j, k], dp[i, j - 1, k], dp[i, j, k - 1]) otherwise

To get the actual subsequence back, use a recursive function that starts from dp[a.Length, b.Length, c.Length] and basically reverses the above formulas: if the three elements are equal, backtrack to dp[a.Length - 1, b.Length - 1, c.Length - 1] and print the character. If not, backtrack according to the max of the above values.

You can look into " Design of a new Deterministic Algorithm for finding Common DNA Subsequence " paper. You can use this algorithm to construct the DAG (pg 8, figure 5). From the DAG, read the largest common distinct subsequences. Then try a dynamic programming approach on that using the value of M to decide how many DAGs you need to construct per sequence. Basically use these subsequences as key and store the corresponding sequence numbers where it is found and then try to find the largest subsequence (which can be more than 1).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM