简体   繁体   English

Substring 匹配和最长公共子序列作为编辑距离的变化问题——Skiena

[英]Substring Matching and Longest Common Subsequence as Variation of Edit Distance Problem -- Skiena

In Algorithm Design Manual, edit distance is solved by the following algorithm在算法设计手册中,编辑距离通过以下算法求解

#define INSERT    1       /* enumerated type symbol for insert */
#define DELETE    2       /* enumerated type symbol for delete */

int string_compare(char *s, char *t, int i, int j)
{
        int k;                  /* counter */
        int opt[3];             /* cost of the three options */
        int lowest_cost;        /* lowest cost */

        if (i == 0) return(j * indel(' '));
        if (j == 0) return(i * indel(' '));

        opt[MATCH] = string_compare(s,t,i-1,j-1) + match(s[i],t[j]);
        opt[INSERT] = string_compare(s,t,i,j-1) + indel(t[j]);
        opt[DELETE] = string_compare(s,t,i-1,j) + indel(s[i]);

        lowest_cost = opt[MATCH];
        for (k=INSERT; k<=DELETE; k++)
                if (opt[k] < lowest_cost) lowest_cost = opt[k];

        return( lowest_cost );
}

I understand everything up to this point but am struggling to understand the following section where substring matching and longest common subsequence are solved as variations of the edit distance problem.到目前为止,我了解所有内容,但我很难理解以下部分,其中 substring 匹配和最长公共子序列作为编辑距离问题的变体被解决。 I believe I kind of understand the intuition behind them, where the least amount of edits means preserving the "sequences of interest".我相信我有点理解它们背后的直觉,其中最少的编辑意味着保留“感兴趣的序列”。 In the case of substring matching, it is the substring;在substring匹配的情况下,就是substring; in the case of the longest common subsequence, it is that common subsequence.在最长公共子序列的情况下,就是那个公共子序列。 However, I don't understand how exactly each problem is solved.但是,我不明白每个问题是如何解决的。

For substring matching, following changes are made:对于 substring 匹配,进行了以下更改:

row_init(int i)
{
    m[0][i].cost = 0; /* note change */
    m[0][i].parent = -1; /* note change */
}
goal_cell(char *s, char *t, int *i, int *j)
{
    int k; /* counter */
    *i = strlen(s) - 1;
    *j = 0;
    for (k=1; k<strlen(t); k++)
        if (m[*i][k].cost < m[*i][*j].cost) *j = k;
    }
}

For longest common subsequence, the following change is made:对于最长公共子序列,进行以下更改:

int match(char c, char d)
{
    if (c == d) return(0);
    else return(MAXLEN);
}

Would someone care to explain and help me understand this better?有人愿意解释并帮助我更好地理解这一点吗?

There is an explanation of the substring matching problem in Section 21.4 of the book. substring 匹配问题在本书第 21.4 节有解释。

Explanation Screenshot -> https://i.stack.imgur.com/N56LP.png说明截图 -> https://i.stack.imgur.com/N56LP.png

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM