简体   繁体   中英

Find the shortest prefix T of string S, such that S is a prefix of T^n

The general idea we used was using a greedy algorithm that checked the remaining of the string and made comparisons.

This idea didn't work, and the general idea is probably using some sort of suffix tree or KMP algorithm, but everything I try fails.

Can anyone help please?

PS: T^n is the prefix times n, as n is the length of the string and the the string letters are between is[1..n]

I would use rolling hash just like in Rabin karp algorithm . First double S so that you are sure that T^n is prefix of S*S.

Next iterate over the length of T. For each length you may compute the hash code of T^n in logarithmic complexity(quite similarly to binary exponentiation). Also after a linear precomputation over S*S you may find the hash code of each of its substrings in constant time(you need one more array that contains the hashes of all its prefixes and one more that contains the powers of the prime you are using for hashing). So you may check for each length if T^n == SUBSTRING(S^2, n * LENGTH_OG(T)) in O(log(n)) (here you should think a bit how to make the time to compute the hash of t constant for each iteration). So the overall complexity of the method proposed will be O(LENGTH(S) * Log(LENGTH(S))).

Hope this helps.

EDIT: I believe I have found a linear solution to the problem. It is based on KMP just as you state. After computing the failure function for your string observe its values. For instance for the case:

string s = "abcdababcdababcdababcdababc";

The values are as follows:

   a     b     c    d    a    b    a    b    c    d    a    b    a    b    c    d    a    b    a    b    c    d    a    b    a    b    c  
 -001  -001  -001  -001  000  001  000  001  002  003  004  005  006  007  008  009  010  011  012  013  014  015  016  017  018  019  020

Take a look at the value you have at the final index. I believe if you subtract it from the length of S and then subtract one more, you will get the length of the shortest repeated substring. In this example you have 27 - 20 - 1 = 6 . It is easier to observe in the case I show above - when the failure function ends with sequence of values from 0 to 20. But in fact if you have some other values that end with 20, then 0 to 20 will again be valid values for the failure function it will simply be skipping over some of the possibilities. Hope this makes sense. This algorithm is linear.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM