简体   繁体   English

最长重复字节序列

[英]The longest repeating sequence of byte

I have code that should look for the longest repeating sequence. 我的代码应该寻找最长的重复序列。 But in this sequence 但是按照这个顺序

7888885466662716666 7888885466662716666

and it outputs the first occurrence in the index 1-5 and the second in 2-6, element 8. But 6 must be output since they are duplicate. 并在索引1-5中输出第一个匹配项,并在元素8中输出第二个匹配项。但是,必须输出6,因为它们是重复的,因此必须输出。 I thought to go through my sequence of this algorithm along this path 我想沿着这条路径遍历此算法的顺序

  • check if the first character is repeated over the whole string, if not 检查第一个字符是否在整个字符串上重复,否则

  • check if the 2 initial characters are repeated all over, if not 检查两个初始字符是否全部重复,否则

  • check if the 3 ... 检查是否3 ...

But I do not know how to take this into my code, can you tell? 但是我不知道如何将其纳入我的代码,您能告诉我吗?

    private int element;
    private int lastElement;
    private int length;

    private byte[] readByteFromFile(File name) throws IOException {
        return Files.readAllBytes(name.toPath());
    }

    private void searchByte(byte[] byteMass) throws InterruptedException {
        for (int i = 0; i < byteMass.length; i++) {
                int count = 0;
                for (int j = i + 1; j < byteMass.length; j++) {
                    if (byteMass[i + count] == byteMass[j]) {
                        if (count >= length) {
                            length = count + 1;
                            element = i;
                            lastElement = j - count;
                        }
                        count++;
                    } else {
                        count = 0;
                    }
                }
        }
    }

I will be completely honest, I'm not too proud about this solution.. In some other programming languages I'm fairly skilled at I would be able to get the solution pretty easily ( here is a possible implementation in 05AB1E for example ), but in Java it's very hard imho. 我会说实话,我对此解决方案并不感到骄傲。在其他一些我很熟练的编程语言中,我将很容易就能获得该解决方案( 例如这是05AB1E中的可能实现 ),但是在Java中,恕我直言很难。

I have been able to find a solution by converting the input byte[] to a String and check its substrings. 通过将输入的byte[]转换为String并检查其子String ,我已经找到了解决方案。 Performance-wise it's crap however, so I would advice to keep searching for an alternative way to do this. 但是从性能角度来看,这是胡扯,所以我建议继续寻找替代方法来做到这一点。

Regardless, my code is working, so I'll just post it anyway in case parts of it are useful or inspirational: 无论如何,我的代码都可以正常工作,因此无论如何我都会将其发布,以防其中的一部分有用或鼓舞人心:

class Main{
  public static void main(String[] args){
    Main m = new Main();
    m.test("7888885466662716666".getBytes());
  }

  private void test(byte[] input){
    String result = findLongestRepeatedSubsequence("7888885466662716666".getBytes());
    System.out.println("The longest repeating subsequence in " + new String(input) + " is: " + result);
  }

  private String findLongestRepeatedSubsequence(byte[] byteMass){
    // Convert the bytes to a String:
    String bytesAsString = new String(byteMass);
    // Loop as long as this String has at least 1 character left:
    while(bytesAsString.length() > 0){
      // Split the String into characters, where each character is a loose String of length 1
      String[] charsAsStringArray = bytesAsString.split("");
      int length = charsAsStringArray.length;
      int maxCount = 0;
      int startingIndex = 0;
      // Loop `i` in the range [0, length_of_String_array)
      for(int i = 0; i < length; i++){
        // Take the substring where the first `i` characters are removed
        String subString = bytesAsString.substring(i);
        String currentChar = charsAsStringArray[i];
        // Count the amount of subsequent times the current character occurs at the start of the substring
        int count = subString.length() - subString.replaceFirst(currentChar+"*", "").length();
        // If this count is larger than our current maxCount:
        if(count > maxCount){
          // Replace the maxCount with this count
          maxCount = count;
          // And set the index where we've found this longest subsequence (`i`) as well
          startingIndex = i;
        }
      }
      // After we've checked all substrings, get the longest subsequence we've found
      String longestSub = bytesAsString.substring(startingIndex, startingIndex + maxCount);
      // Split the entire String with this longest subsequence to get its occurrence-count
      int occurrenceCounter = bytesAsString.split(longestSub, -1).length - 1;
      // If we've found a subsequence that occurs at least twice:
      if(occurrenceCounter > 1){
        // Return it as result
        return longestSub;
      }
      // If this longest subsequence only occurs once:
      else{
        // Remove the first character of this found subsequence from the String
        bytesAsString = bytesAsString.substring(0, startingIndex) +
                        (startingIndex < length-1 ? 
                           bytesAsString.substring(startingIndex + 1)
                         :
                           "");
      }
    }
    // Mandatory return if the input is empty
    return null;
  }
}

Try it online. 在线尝试。 (USEFUL: Contains some additional print lines in comparison to the code above.) (有用:与上面的代码相比,它包含一些其他打印行。)

Here's my hacked together solution I wrote yesterday... 这是我昨天写的我一起被黑客入侵的解决方案...

Basically it checks if input.charAt(i) == input.charAt(i + 1) and if so, runs a second loop until they don't match, all the while appending to a String , and adds to a List . 基本上,它检查input.charAt(i) == input.charAt(i + 1) ,如果是,则运行第二个循环直到它们不匹配为止,并始终附加到String ,并添加到List And repeat. 重复一遍。

Then check the List for the highest occurrence (shamelessly stolen from here ) 然后检查List中发生率最高的List (从这里无耻地被盗)

public static void addToList(String input) {
    String temp;
    List<String> l = new ArrayList<>();
    for (int i = 0; i < input.length() - 1; i++) {
        if (input.charAt(i) == input.charAt(i + 1)) {
            temp = String.valueOf(input.charAt(i));
            for (int j = i; j < input.length() - 1; j++) {
                if (input.charAt(j) == input.charAt(j + 1)) {
                    temp += String.valueOf(input.charAt(j + 1));
                    if (j == input.length() - 2) {
                        i = j;
                        if (!temp.isEmpty()) {
                            l.add(temp);
                        }
                        break;
                    }
                } else {
                    i = j - 1;
                    if (!temp.isEmpty()) {
                        l.add(temp);
                    }
                    break;
                }
            }
        }
    }
    System.out.println(getHighestOccurences(l));
}

public static String getHighestOccurences(List<String> list) {
    int max = 0;
    int curr;
    String currKey = null;
    Set<String> unique = new HashSet<>(list);
    for (String key : unique) {
        curr = Collections.frequency(list, key);
        if (max < curr) {
            max = curr;
            currKey = key;
        }
    }
    return currKey;
}

With your input being String input = "7888885466662716666"; 输入为String input = "7888885466662716666"; and calling addToList(input); 并调用addToList(input); gives an output of: 给出以下输出:

6666 6666

.

Online Demo 在线演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM