简体   繁体   English

算法 - 基于重复因子的字符串匹配

[英]Algorithm - String matching based on repetition factors

I have three strings as the input (A,B,C). 我有三个字符串作为输入(A,B,C)。

A = "SLOVO", B = "WORD", C = A =“SLOVO”,B =“WORD”,C =

在此输入图像描述

And I need to find algorithm which decide, if the string C is a concatenation of infinite repetiton strings A and B. Example of repetition: A^2 = "SLOVOSLOVO" and in the string C is first 8 letters "SLOVOSLO" from "SLOVOSLOVO". 并且我需要找到算法,如果字符串C是无限重复字符串A和B的串联,则重复示例:A ^ 2 =“SLOVOSLOVO”并且字符串C是来自“SLOVOSLOVO”的前8个字母“SLOVOSLO” ”。 String B is similar. 字符串B类似。

My idea for algorithm: 我对算法的想法:

index_A = 0; //index of actual letter of string A
index_B = 0;

Go throught the hole string C from 0 to size(C)
{
  Pick the actual letter from C (C[i])
  if(C[i] == A[index_A] && C[i] != B[index_B])
  {
   index_A++;
   Go to next letter in C 
  }
  else if(C[i] == B[index_B] && C[i] != A[index_A])
  {
   index_B++;
   Go to next letter in C 
  }
  else if(C[i] == B[index_B] && C[i] == A[index_A])
  {
   Now we couldn´t decice which way to go, so we should test both options (maybe recusrsion)
  }
  else
  {
   return false;
  }
}

It´s only quick description of the algorithm but I hope you understand main idea of this algorithm should do. 它只是对算法的快速描述,但我希望你理解这个算法应该做的主要思路。 Is this the way of solving this problem good? 这是解决这个问题的好方法吗? Do you have better solution? 你有更好的解决方案吗? Or some tips? 还是一些提示?

Basically you've got the problem that every regular expression matcher has. 基本上你已经遇到了每个正则表达式匹配器都有的问题。 Yes, you would need to test both options, and if one doesn't work you will have to backtrack to the other. 是的,您需要测试两个选项,如果一个不起作用,您将不得不回溯到另一个。 Expressing your loop over the string recursively can help here. 在递归上表达你的字符串循环可以在这里有所帮助。

However, there is also a way to try both options at the same time. 但是,还有一种方法可以同时尝试这两个选项。 See the popular article Regular Expression Matching Can Be Simple And Fast for the idea - you basically keep track of all possible positions in the two strings during the iteration of c . 请参阅流行文章正则表达式匹配可以简单快速的想法 - 您基本上在c的迭代期间跟踪两个字符串中的所有可能位置。 The required lookup structure would have a size of len(A)*len(B) , as you can just use a modulus for the string position instead of storing the position in the infinite, repeated string. 所需的查找结构的大小为len(A)*len(B) ,因为您可以使用模数作为字符串位置,而不是将位置存储在无限重复的字符串中。

// some (pythonic) pseudocode for this:

isIntermixedRepetition(a, b, c)
    alen = length(a)
    blen = length(c)
    pos = new Set() // to store tuples
                    // could be implemented as bool array of dimension alen*blen
    pos.add( [0,0] ) // init start pos
    for ci of c
        totest = pos.getContents() // copy and
        pos.clear()                // empty the set
        for [indexA, indexB] of totest
            if a[indexA] == ci
                pos.add( [indexA + 1 % alen, indexB] )
            // no else
            if b[indexB] == ci
                pos.add( [indexA, indexB + 1 % blen] )
        if pos.isEmpty
            break
    return !pos.isEmpty

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM