简体   繁体   中英

Given string s, find the shortest string t, such that, t^m=s

Given string s, find the shortest string t, such that, t^m=s.

Examples:

s="aabbb" => t="aabbb"
s="abab"  => t = "ab"

How fast can it be done?

Of course naively, for every m divides |s|, I can try if substring(s,0,|s|/m)^m = s.

One can figure out the solution in O(d(|s|)n) time, where d(x) is the number of divisors of s. Can it be done more efficiently?

This is the problem of computing the period of a string. Knuth, Morris and Pratt's sequential string matching algorithm is a good place to get started. This is in a paper entitled "Fast Pattern Matching in Strings" from 1977.

If you want to get fancy with it, then check out the paper "Finding All Periods and Initial Palindromes of a String in Parallel" by Breslauer and Galil in 1991. From their abstract:

An optimal O(log log n) time CRCW-PRAM algorithm for computing all periods of a string is presented. Previous parallel algorithms compute the period only if it is shorter than half of the length of the string. This algorithm can be used to find all initial palindromes of a string in the same time and processor bounds. Both algorithms are the fastest possible over a general alphabet. We derive a lower bound for finding palindromes by a modification of a previously known lower bound for finding the period of a string [3]. When p processors are available the bounds become \\Theta(dnpe + log log d1+p=ne 2p).

Yes you can do it in O(|s|) time.

You can search for a "target" string of length n in a "source" string of length m in O(n+m) time. Build a solution based on that.

Let both source and target be s . An additional constraint is that 1 and any positions in the source that do not divide |s| are not valid starting positions for the match. Of course the search per se will always fail. But if there's a partial match and you have reached the end of the sourse string, then you have a solution to the original problem.

I really like this thing called the z-algorithm: http://www.utdallas.edu/~besp/demo/John2010/z-algorithm.htm

For every position it calculates the longest substring starting from there, that is also a prefix of the whole string. (in linear time of course).

a   a   b   c   a   a   b   x   a   a   a   z
    1   0   0   3   1   0   0   2   2   1   0

Given this "z-table" it is easy to find all strings that can be exponentiated to the whole thing. Just check for all positions if pos+z[pos] = n .

In our case:

a   b   a   b
    0   2   0

Here pos = 2 gives you 2+z[2] = 4 = n hence the shortest string you can use is the one of length 2.

This will even allow you to find cases where only a prefix of the exponentiated string matches, like:

a   b   c   a
    0   0   1

Here (abc)^2 can be cut down to your original string. But of course, if you don't want matches like this, just go over the divisors only.

a modification to Boyer-Moore could possibly handle this in O(n) where n is length of s

http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM