简体   繁体   中英

Longest common subsequence optimized

I'm currently trying to find and print longest common subsequence for 2 given strings. I use most common algorithm without recursion. It's simple task if I keep whole array, but I'm trying to optimize it a bit and use only 2 rows, what you can see in code below. With this change, finding length is still simple and working fine, but recovering subsequence is not so easy any more. I've tried to do it in few ways but neither worked. Below you can see my last attempt. Although it works for same cases, there are also cases where it fails. After thinking for a long time I'm starting to believe that there is no way to recover subsequence using array with only 2 rows. My research didn't bring me exact answer so I'm asking if there is a way to achieve what I'm trying to do? Or am I stuck with keeping whole array if I want to print?

//finding length of longest common subsequence
for(int i=1; i<m; i++) {
    for(int j=1; j<n; j++) {
        if(sequece1[i-1] == sequence2[j-1]) {
            tab[i%2][j] = tab[(i-1)%2][j-1] + 1;
        } else {
            tab[i%2][j] = max(tab[i%2][j-1],tab[(i-1)%2][j]);
        }
    }
}

//trying to reconstruct longest common subsequence
int last_row = (m-1)%2;
for(int j=n-1; j>0; j--) {
    if(tab[last_row][j-1] < tab[last_row][j]) {
        if(last_row == 0) {
            common_part += sequence2[j];
            } else {
            common_part += sequence2[j-1];
        }
    }
}

It seems that there is no simple way to accomplish, because if you keep only two last columns, an essential part of information is lost.

For example, consider two cases: ( abcc , acc ) strings and ( abcc , bcc ) strings. The matrix for these cases will be

1 1 1 1    and  0 1 1 1
1 1 2 2         0 1 2 2
1 1 2 3         0 1 2 3

You see that the last two columns are identical in both cases, so you will not distinguish these cases judging only by the last two columns. But you need to distinguish them, because the answers are different ( acc and bcc ). Of course, you still have the original strings and can use information from there, but I think (though I have not proved this) that this is more-or-less equivalent to finding an LCS for some prefixes of the original strings.

At the same time, there is a more advanced algorith that works in quadratic time and linear space.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM