简体   繁体   English

"查找基于模式的字符串匹配的算法"

[英]Algorithm to find pattern based string match

There are 2 input string r, s<\/code> .有 2 个输入字符串r, s<\/code> 。 The algorithm checks if there is a match between them: every char in r matches only one non - empty substring in s.该算法检查它们之间是否存在匹配:r 中的每个 char 仅匹配 s 中的一个非空子字符串。 And different chars in r match different substrings in s. r 中的不同字符匹配 s 中的不同子字符串。 For example if r = "ABA"<\/code> and s = "hibiehi"<\/code> .例如,如果r = "ABA"<\/code>和s = "hibiehi"<\/code> 。 there is a match A = "hi"<\/code> , B = "bie"<\/code> But if r = "ABA"<\/code> and s = "hibie"<\/code> they don`t match.有一个匹配A = "hi"<\/code> , B = "bie"<\/code>但是如果r = "ABA"<\/code>和s = "hibie"<\/code>他们不匹配。

"

You can do this with depth-first search, also called backtracking.您可以使用深度优先搜索(也称为回溯)来做到这一点。 We can greedily try to match each letter of the pattern to substrings of the word, and backtrack upon failure.我们可以贪婪地尝试将模式的每个字母与单词的子字符串匹配,并在失败时回溯。 Keep track of the previous matches (and inverse matches) using a hashmap.使用哈希图跟踪先前的匹配(和反向匹配)。

The running time of this approach is exponential (in the pattern length, at least), although I haven't analyzed it exactly.这种方法的运行时间是指数级的(至少在模式长度上),尽管我没有准确地分析过它。 You can speed this up with early stopping\/pruning the search tree, but it would take a new idea to get a fully polynomial runtime.您可以通过提前停止\/修剪搜索树来加快速度,但是要获得完全多项式的运行时需要一个新的想法。

Python implementation: Python实现:

def word_pattern(pattern: str, word: str) -> bool:
    """Given a pattern string, check if 'word' can match pattern.
    'Match' here means a bijection between each character in pattern
    with a nonempty substring in word.
    """

    pattern_len = len(pattern)
    word_len = len(word)

    def dfs(biject_pat_to_word: Dict[str, str],
            already_used_strs: Set[str],
            pattern_index: int,
            word_index: int) -> bool:
        """Greedily try to match each pattern character to the shortest
        possible substring of word, consistent with current bijection.
        Return whether this was possible."""

        if pattern_index == pattern_len:  # Reached pattern end
            return word_index == word_len
        next_letter = pattern[pattern_index]

        # If we've already seen this pattern char, it must match again
        if next_letter in biject_pat_to_word:
            pat_match = biject_pat_to_word[next_letter]
            if word[word_index:word_index + len(pat_match)] != pat_match:
                return False

            word_index += len(pat_match)
            pattern_index += 1

            return dfs(biject_pat_to_word, already_used_strs,
                       pattern_index, word_index)

        curr_str_match = ''
        for amount_to_take in range(1, word_len - word_index + 1):
            curr_str_match += word[word_index + amount_to_take - 1]
            if curr_str_match in already_used_strs:
                continue

            biject_pat_to_word[next_letter] = curr_str_match
            already_used_strs.add(curr_str_match)

            # Try to use this pattern
            if dfs(biject_pat_to_word, already_used_strs,
                   pattern_index=pattern_index + 1,
                   word_index=word_index + amount_to_take):
                return True

            already_used_strs.discard(curr_str_match)

        # If we've set which string we match, unset it for future calls
        if next_letter in biject_pat_to_word:
            del biject_pat_to_word[next_letter]

        return False

    return dfs(biject_pat_to_word={},
               already_used_strs=set(),
               pattern_index=0, word_index=0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM