简体   繁体   English

最大重复序列而不是最长重复序列

[英]Maximum repeating sequence instead of longest repeating sequence

I am trying to get the most repeated sequence of characters in a string.我正在尝试获取字符串中重复次数最多的字符序列。 For example :例如 :

Input:输入:

s = "abccbaabccba"

Output:输出:

 2

I have used dynamic programming to figure out the repeating sequence, but this returns the longest repeating character sequence.我使用动态编程来找出重复序列,但这会返回最长的重复字符序列。 For example:例如:

Input:输入:

s = "abcabcabcabc"

Output:输出:

2   
2(abcabc,abcabc) instead of 4(abc,abc,abc,abc)

Here is the part of the code where I'm filling the DP table and extracting repeating sequence.这是我填充 DP 表并提取重复序列的代码部分。 Can anyone suggest how I can get the most repeating sequence?谁能建议我如何获得最多重复的序列?

 //Run through the string and fill the DP table.
        char[] chars = s.toCharArray();
        for(int i = 1; i <= length; i++){
            for(int j = 1; j <= length; j++){
                if( chars[i-1] == chars[j-1] && Math.abs(i-j) > table[i-1][j-1]){
                    table[i][j] = table[i-1][j-1] + 1;
                    if(table[i][j] > max_length_sub){
                        max_length_sub = table[i][j];
                        array_index = Math.min(i, j);
                    }
                }else{
                    table[i][j] = 0;
                }
            }               
        }       
        //Check if there was a repeating sequence and return the number of times it occurred.
        if( max_length_sub > 0 ){
            String temp = s;
            String subSeq = "";
            for(int i = (array_index - max_length_sub); i< max_length_sub; i++){
                subSeq = subSeq + s.charAt(i);
            }
            System.out.println( subSeq );
            Pattern pattern = Pattern.compile(subSeq);
            Matcher  matcher = pattern.matcher(s);
            int count = 0;
            while (matcher.find())
                count++;

            // To find left overs - doesn't seem to matter 
            String[] splits = temp.split(subSeq);
            if (splits.length == 0){
                return count;
            }else{
                return 0;
            }
        }

Simple and dump, the the smallest sequence to be considered is a pair of characters (*):简单和转储,要考虑的最小序列是一对字符(*):

  • loop over the whole String an get every consecutive pair of characters, like using a for and substring to get the characters;遍历整个字符串并获取每对连续的字符,例如使用forsubstring来获取字符;
  • count the occurrence of that pair in the String, create a method countOccurrences() using indexof(String, int) or regular expressions;计算该对在字符串中的出现次数,使用indexof(String, int)或正则表达式创建一个方法countOccurrences() and
  • store the greatest count, use one variable maxCount outside the loop and an if to check if the actual count is greater (or Math.max() )存储最大计数,在循环外使用一个变量maxCount和一个if来检查实际计数是否更大(或Math.max()

(*) if "abc" occurs 5 times, than "ab" (and "bc") will occur at least 5 times too - so it is enough to search just for "ab" and "bc", not need to check "abc" (*) 如果 "abc" 出现 5 次,那么 "ab"(和 "bc")也至少会出现 5 次 - 所以只搜索 "ab" 和 "bc" 就足够了,不需要检查 " ABC”

Edit without leftovers, see comments, summary:编辑没有剩菜,见评论,总结:

  • check if the first character is repeated over the whole string, if not检查第一个字符是否在整个字符串中重复,如果不是

  • check if the 2 initial characters are repeated all over, if not检查 2 个初始字符是否全部重复,如果没有

  • check if the 3 ...检查是否 3 ...

at least 2 counters/loops needed: one for the number of characters to test, second for the position being tested.至少需要 2 个计数器/循环:一个是要测试的字符数,第二个是被测试的位置。 Some arithmetic could be used to improve performance: the length of the string must be divisible by the number of repeated characters without remainder.可以使用一些算术来提高性能:字符串的长度必须能被重复字符的数量整除而没有余数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM