简体   繁体   English

这种通配符匹配算法的时间复杂度是多少?

[英]What's time complexity of this algorithm for Wildcard Matching?

Wildcard Matching 通配符匹配
Implement wildcard pattern matching with support for ' ? 实现通配符模式匹配并支持' ' and ' * '. '和' * '。

  • '?' '?' Matches any single character. 匹配任何单个字符。
  • '*' Matches any sequence of characters (including the empty sequence). '*'匹配任何字符序列(包括空序列)。

The matching should cover the entire input string (not partial). 匹配应覆盖整个输入字符串(不是部分)。

The function prototype should be: 函数原型应该是:
bool isMatch(const char *s, const char *p) bool isMatch(const char * s,const char * p)

Some examples: 一些例子:

  • isMatch("aa","a") → false isMatch(“aa”,“a”)→false
  • isMatch("aa","aa") → true isMatch(“aa”,“aa”)→是的
  • isMatch("aaa","aa") → false isMatch(“aaa”,“aa”)→false
  • isMatch("aa", "*") → true isMatch(“aa”,“*”)→true
  • isMatch("aa", "a*") → true isMatch(“aa”,“a *”)→true
  • isMatch("ab", "?*") → true isMatch(“ab”,“?*”)→true
  • isMatch("aab", "c*a*b") → false isMatch(“aab”,“c * a * b”)→false

Question: 题:

  • What's time complexity? 时间复杂度是多少?
  • What's space complexity? 什么是空间复杂性?

Personally, I think 我个人认为

  • Time complexity highly dependents on the "input", can not write it out like T = O(?). 时间复杂度高度依赖于“输入”,不能像T = O(?)那样写出来。
  • Space complexity = O(min(sLen, pLen)), because the max recursion depth = O(min(sLen, pLen)). 空间复杂度= O(min(sLen,pLen)),因为最大递归深度= O(min(sLen,pLen))。

Have tried: 试过:
Write out Time complexity Expression, then draw recursion tree: 写出时间复杂度表达式,然后绘制递归树:

TC Expression => T(n) = T(n - 1) + O(1),            when pChar == '?' or pChar == sChar,
                      = T(n - 1) + T(n - 1) + O(1), when pChar == '*'.

I tried to draw recursion tree, but can not figure out how to draw it based on this kind of Time Complexity Expression. 我试图绘制递归树,但无法弄清楚如何根据这种时间复杂度表达式绘制它。

Additional question: 附加问题:
Accurately, I hope to know how to calculate the time complexity for this kind of recursion , which has multi-unforeseen-branch based on input. 准确地说,我希望知道如何计算这种递归的时间复杂度,这种递归具有基于输入的多不可预见的分支。

Note: 注意:

  • I know both iterative-solution and recursive-solution, but can not figure out how to calculate time complexity for the recursive-solution. 我知道迭代解决方案和递归解决方案,但无法弄清楚如何计算递归解决方案的时间复杂度。
  • And this is not homework, this question is from "leetcode.com", I just hope to know the method how to calculate time complexity for this special kind of recursion. 这不是功课,这个问题来自“leetcode.com”,我只是希望知道如何计算这种特殊递归的时间复杂度的方法。


Code: Java, Solution: Recursion. 代码: Java, 解决方案:递归。

public class Solution {
    public boolean isMatch(String s, String p) {
        // Input checking.
        if (s == null || p == null) return false;

        int sLen = s.length();
        int pLen = p.length();

        return helper(s, 0, sLen, p, 0, pLen);
    }

    private boolean helper(String s, int sIndex, int sLen,
                           String p, int pIndex, int pLen) {
        // Base case.
        if (sIndex >= sLen && pIndex >= pLen) return true;
        else if (sIndex >= sLen) {
            // Check whether the remaining part of p all "*".
            while (pIndex < pLen) {
                if (p.charAt(pIndex) != '*') return false;
                pIndex ++;
            }
            return true;

        } else if (pIndex >= pLen) {
            return false;
        }

        char sc = s.charAt(sIndex);
        char pc = p.charAt(pIndex);

        if (pc == '?' || pc == sc) {
            return helper(s, sIndex + 1, sLen, p, pIndex + 1, pLen);

        } else if (pc == '*') {
            return helper(s, sIndex, sLen, p, pIndex + 1, pLen) ||
                   helper(s, sIndex + 1, sLen, p, pIndex, pLen);

        } else return false;
    }
}

In order to get an upper bound (ie, big-O) on the worst case running time, you need to assume the very worst. 为了在最坏的情况下获得上限(即大O),您需要假设最坏的情况。 The correct recurrence for an upper bound on the asymptotic running time of matching a string of length s with a pattern of length p is as follows. 对于长度为s的字符串与长度为p的模式匹配的渐近运行时间的上限的正确递归如下。

T(s, p) | s == 0 || p == 0 = 1
        | s >  0 && p >  0 = 1 + max(T(s, p - 1) + T(s - 1, p),  // *
                                     T(s - 1, p - 1))            // ? or literal

Solving two-variable recurrences like this can be tricky. 解决这样的双变量复发可能很棘手。 In this particular case, one can show fairly easily by induction that T is non-decreasing in both arguments, and so we can simplify the max. 在这种特殊情况下,人们可以通过归纳相当容易地表明T在两个参数中都是非递减的,因此我们可以简化最大值。

T(s, p) | s == 0 || p == 0 = 1
        | s >  0 && p >  0 = 1 + T(s, p - 1) + T(s - 1, p)

Now one, with experience, can recognize the strong resemblance to a recurrence for binomial coefficients and make the (admittedly slightly magical) substitutions s = n - k and p = k and T(s, p) = 2 U(n, k) - 1 . 现在,有经验的人可以认识到与二项式系数的递归有很强的相似性,并使(无可否认的是有点神奇的)替换s = n - kp = kT(s, p) = 2 U(n, k) - 1

2 U(n, k) - 1 | n == k || k == 0 = 1
              | n >  k && k >  0 = 1 + 2 U(n - 1, k - 1) - 1 + 2 U(n - 1, k) - 1

U(n, k) | n == k || k == 0 = 1
        | n >  k && k >  0 = U(n - 1, k - 1) + U(n - 1, k)

We conclude that T(s, p) = 2 U(s + p, p) - 1 = 2 ((s + p) choose p) - 1 = O(2^(s + p)/sqrt(s + p)) by Stirling's approximation (that's the best big-O bound possible in the single quantity s + p , but it's confusing if I write big-Theta). 我们得出结论, T(s, p) = 2 U(s + p, p) - 1 = 2 ((s + p) choose p) - 1 = O(2^(s + p)/sqrt(s + p))通过斯特林的近似(这是单个量s + p可能的最佳大O界限,但如果我写大-Theta则令人困惑)。

So far we have proved only that T(s, p) is an upper bound. 到目前为止,我们只证明了T(s, p)是一个上界。 Since * was the more troublesome case, an idea for the worst case presents itself: make the pattern all * s. 由于*是更麻烦的情况,最坏情况的想法出现了:使模式全部* s。 We have to be a little bit careful, because if the match succeeds, then there's some short-circuiting possible. 我们必须要小心一点,因为如果匹配成功,那么可能会有一些短路。 However, it takes very little to prevent a match: consider the string 0000000000 and the pattern **********1 (adjust the number of 0 s and * as desired). 但是,防止匹配只需要很少的时间:考虑字符串0000000000和模式**********1 (根据需要调整0 s和*的数量)。 This example shows that the quoted bound is tight to within a polynomial factor (negligible, since the running time already is exponential). 此示例显示引用的边界在多项式因子内是紧的(可忽略不计,因为运行时间已经是指数)。


For the purpose of getting just an upper bound, it's not necessary to work out these recurrences nearly so precisely. 为了获得上限,没有必要几乎精确地计算出这些重现。 For example, I might guess that T(s, p) <= 3^(s + p) and proceed to verify that claim by induction. 例如,我可能猜测T(s, p) <= 3^(s + p)并继续通过归纳验证声明。

T(s, p) | s = 0 || p = 0  = 1 <= 3^(s + p)                 // base case
        | s > 0 || p > 0  = 1 + T(s, p - 1) + T(s - 1, p)  // induction
                         <= 3^(s + p - 1) + 3^(s + p - 1) + 3^(s + p - 1)
                          = 3^(s + p)

Now, 3^(s + p) is a valid upper bound, though in light of the rest of this answer it's not tight. 现在, 3^(s + p)是一个有效的上限,但鉴于这个答案的其余部分,它并不紧张。 One now can look for waste in the bounds; 一个人现在可以在边界寻找浪费; 1 <= 3^(s + p - 1) , for example, is a gross overestimate, and with some tricks, we can get the exponential base 2 . 1 <= 3^(s + p - 1)是一个粗略的高估,并且通过一些技巧,我们可以得到指数基数2

The more important order of business, however, is to get an exponential lower bound. 然而,更重要的业务秩序是获得指数下限。 From drawing the recursion tree for the bad example above, I might conjecture that T(s, p) >= 2^min(s, p) . 从绘制上面的坏例子的递归树,我可能猜想T(s, p) >= 2^min(s, p) This can be verified by induction. 这可以通过归纳验证。

T(s, p) | s = 0 || p = 0  = 1 >= 2^min(s, p) = 2^0 = 1             // base case
        | s > 0 && p > 0  = 1 +     T(s, p - 1) +     T(s - 1, p)  // induction
                         >=     2^min(s, p - 1) + 2^min(s - 1, p)
                         >= 2^(min(s, p) - 1) + 2^(min(s, p) - 1)
                          = 2^min(s, p)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM