简体   繁体   中英

Repeating non-overlapping substrings

This is finding the longest repeating substring code (source: geeksforgeeks):

def longestRepeatedSubstring(str): 

    n = len(str) 
    LCSRe = [[0 for x in range(n + 1)] 
                for y in range(n + 1)] 

    res = "" # To store result 
    res_length = 0 # To store length of result 

    # building table in bottom-up manner 
    index = 0
    for i in range(1, n + 1): 
        for j in range(i + 1, n + 1): 

            # (j-i) > LCSRe[i-1][j-1] to remove 
            # overlapping 
            if (str[i - 1] == str[j - 1] and
                LCSRe[i - 1][j - 1] < (j - i)): 
                LCSRe[i][j] = LCSRe[i - 1][j - 1] + 1

                # updating maximum length of the 
                # substring and updating the finishing 
                # index of the suffix 
                if (LCSRe[i][j] > res_length): 
                    res_length = LCSRe[i][j] 
                    index = max(i, index) 

            else: 
                LCSRe[i][j] = 0

    # If we have non-empty result, then insert 
    # all characters from first character to 
    # last character of string 
    if (res_length > 0): 
        for i in range(index - res_length + 1, 
                                    index + 1): 
            res = res + str[i - 1] 

    return res 

# Driver Code 
if __name__ == "__main__": 

    str = "geeksforgeeks"
    print(longestRepeatedSubstring(str)) 

# This code is contributed by ita_c 

How can it be modified to obtain also the shorter repeating non-overlapping substrings starting with the substrings of length x and ending with the longest substring? (tried various changes but never got the correct result).

Assuming it's some kind of programming exercise, I don't want to provide the code itself. I'll provide hints.


How can it be modified to obtain also the shorter repeating non-overlapping substrings starting with the substrings of length x and ending with the longest substring? (tried various changes but never got the correct result).

Please let me know if I understand you correctly...

You want to get all non-overlapping longest substrings, right?

First problem: Non-overlapping means that longest substring can cut the other long substrings. And vice versa. Search from longest to x, instead of from x to longest.

Second problem: what do we care about more - length or number of strings?


If we care only about the longest at the current moment, you can:

  1. find the longest match
  2. if its length is longer than wanted x , save it (otherwise quit)
  3. remove all (? what if the string has eg 3 repetitions of the longest, instead of 2?) occurrences of the longest string from the string; keep some delimeter in its place (because eg abbcacbb has the longest substring bb - just removing it will yield acac , giving ac , which is wrong)
  4. repeat

That's basically pseudocode that you need to "translate" into real code. Use while loop. ;)

You don't need to modify the given function - as you can see, point 1 is using it as it is. You just need to get results from multiple calls to that function. ;)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM