简体   繁体   English

最长公共子串的长度,不重复字符

[英]Length of the Longest Common Substring without repeating characters

Given "abcabcbb", the answer is "abc", which the length is 3. 给定“ abcabcbb”,答案为“ abc”,长度为3。

Given "bbbbb", the answer is "b", with the length of 1. 给定“ bbbbb”,答案为“ b”,长度为1。

Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring. 给定“ pwwkew”,答案为“ wke”,长度为3。请注意,答案必须是一个子字符串,“ pwke”是一个子序列,而不是子字符串。

I have came up with a solution that worked, but failed for several test cases. 我想出了一个可行的解决方案,但在多个测试案例中失败了。 I then found a better solution and I rewrote it to try and understand it. 然后,我找到了一个更好的解决方案,并将其改写为尝试并理解它。 The solution below works flawlessly, but after about 2 hours of battling with this thing, I still can not understand why this particular line of code works. 下面的解决方案可以完美地工作,但是经过大约2个小时的奋斗,我仍然不明白为什么这行代码行得通。

import java.util.*;
import java.math.*;

public class Solution {

  public int lengthOfLongestSubstring(String str) {

  if(str.length() == 0)
    return 0;

  HashMap<Character,Integer> map = new HashMap<>();
  int startingIndexOfLongestSubstring = 0;
  int max = 0;

  for(int i = 0; i < str.length(); i++){
      char currentChar = str.charAt(i); 
      if(map.containsKey(currentChar))
         startingIndexOfLongestSubstring = Math.max(startingIndexOfLongestSubstring, map.get(currentChar) + 1);

      map.put(currentChar, i);
      max = Math.max(max, i - startingIndexOfLongestSubstring + 1);

      }//End of loop

    return max;

   }
}

The line in question is 有问题的行是

max = Math.max(max, i - startingIndexOfLongestSubstring + 1);

I don't understand why this works. 我不明白为什么会这样。 We're taking the max between our previous max, and the difference between our current index and the starting index of what is currently the longest substring and then adding 1. I know that the code is getting the difference between our current index, and the startingIndexOfSubstring, but I can't conceptualize WHY it works to give us the intended result; 我们将先前的最大值与当前索引与当前最长的子字符串的起始索引之间的差值相加,然后加1。我知道代码正在获取当前索引与当前索引之间的差值startingIndexOfSubstring,但是我无法概念化为什么它可以给我们预期的结果; Can someone please explain this step to me, particularly WHY it works? 有人可以向我解释此步骤,尤其是为什么它有效吗?

I'm usually bad at explaining, let me give it a shot by considering an example. 我通常不好解释,让我考虑一个例子。

String is "wcabcdeghi". 字符串是“ wcabcdeghi”。

Forget the code for a minute and assume we're trying to come up with a logic. 暂时忘记代码,并假设我们正在尝试提出一个逻辑。

  1. We start from w and keep going until we reach c -> a -> b -> c. 我们从w开始,一直走到到达c-> a-> b-> c。
    We need to stop at this point because "c" is repeating. 由于“ c”在重复,因此我们需要在此处停止。 So we need a map to store if a character is repeated. 因此,如果字符重复,我们需要一个地图来存储。 ( In code : map.put(currentChar, i); ) 在代码中: map.put(currentChar, i);
  2. Now that we know if a character is repeated, We need to know what is the max. 现在我们知道是否重复一个字符,我们需要知道最大字符数是多少。 length so far. 到目前为止。 ( In code - ) max 在代码中-max
  3. Now we know there is no point in keeping track of count of first 2 variables w->c. 现在我们知道跟踪前两个变量w-> c的计数是没有意义的。 This is because including this, we already got the Max. 这是因为包括此在内,我们已经获得了Max。 value. 值。 So from next iteration onwards we need to check length only from a -> b -> soon. 因此从下一次迭代开始,我们只需要从a-> b->马上检查长度。
    Lets have a variable ( In code - ) startingIndexOfLongestSubstring to keep track of this. 让我们有一个变量( 在代码中-startingIndexOfLongestSubstring来跟踪此情况。 (This should've been named startingIndexOfNonRepetativeCharacter, then again I'm bad with naming as well). (这应该被命名为startingIndexOfNonRepetativeCharacter,然后我又对命名不好。)
  4. Now we again keep continuing, but wait we still haven't finalized on how to keep track of sub-string that we're currently parsing. 现在,我们再次继续进行操作,但是请等待,关于如何跟踪当前正在解析的子字符串,我们仍未完成。 (ie, from abcd...) (即,来自abcd ...)
    Coming to think of it, all I need is the position of where "a" was present (which is startingIndexOfNonRepetativeCharacter ) so to know the length of current sub-string all I need to do is ( In code - ) i - startingIndexOfLongestSubstring + 1 (current character position - The non-repetative character length + (subtraction doesn't do inclusive of both sides so adding 1). Lets call this currentLength 想到这一点,我所需要的只是“ a”所在位置的位置(即startingIndexOfNonRepetativeCharacter ),以便知道我需要做的当前子字符串的长度是( 在代码中i - startingIndexOfLongestSubstring + 1 (当前字符位置-非重复字符长度+(减法不包括正反两面,因此加1)。让我们将此称为currentLength
  5. But wait, what are we going to do with this count. 但是,等等,我们将如何处理这一点。 Every time we find a new variable we need to check if this currentLength can break our max. 每次我们找到新变量时,都需要检查currentLength可以破坏最大值。 So ( In code - ) max = Math.max(max, i - startingIndexOfLongestSubstring + 1); 因此( 在代码中-max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
  6. Now we've covered most of the statements that we need and according to our logic everytime we encounter a variable which was already present all we need is startingIndexOfLongestSubstring = map.get(currentChar) . 现在,我们已经涵盖了大多数需要的语句,并且每次遇到一个已经存在的变量时,根据逻辑,我们所需要的就是startingIndexOfLongestSubstring = map.get(currentChar) So why are we doing a Max ? 那我们为什么要做一个Max
    Consider a scenario where String is "wcabcdewghi". 考虑String为“ wcabcdewghi”的情况。 when we start processing our new counter as a -> b -> c -> d -> e -> w At this point our logic checks if this character was present previously or not. 当我们开始以a-> b-> c-> d-> e-> w的形式处理新计数器时,此时我们的逻辑将检查该字符是否以前存在。 Since its present, it starts the count from index "1". 从当前开始,它从索引“ 1”开始计数。 Which totally messes up the whole count. 这完全搞砸了整个计数。 So We need to make sure, the next index we take from map is always greater than the starting point of our count(ie, select a character from the map only if the character occurs before startingIndexOfLongestSubstring ). 因此,我们需要确保从地图中获取的下一个索引始终大于计数的起点(即,仅当字符出现在startingIndexOfLongestSubstring之前,才startingIndexOfLongestSubstring地图中选择一个字符)。

Hope I've answered all lines in the code and mainly If the explanation was understandable. 希望我已经回答了代码中的所有行,主要是如果解释是可以理解的。

Because 因为

i - startingIndexOfLongestSubstring + 1

is amount of characters between i and startingIndexOfLongestSubstring indexes. istartingIndexOfLongestSubstring索引之间的字符数。 For example how many characters between position 2 and 3? 例如,位置2和3之间有多少个字符? 3-2=1 but we have 2 characters: on position 2 and position 3. 3-2=1但我们有2个字符:在位置2和位置3上。

I've described every action in the code: 我已经在代码中描述了每个动作:

public class Solution {

    public int lengthOfLongestSubstring(String str) {

        if(str.length() == 0)
            return 0;

        HashMap<Character,Integer> map = new HashMap<>();
        int startingIndexOfLongestSubstring = 0;
        int max = 0;

        // loop over all characters in the string
        for(int i = 0; i < str.length(); i++){
            // get character at position i
            char currentChar = str.charAt(i);
            // if we already met this character
            if(map.containsKey(currentChar))
                // then get maximum of previous 'startingIndexOfLongestSubstring' and 
                // map.get(currentChar) + 1 (it is last occurrence of the current character in our word before plus 1)
                // "plus 1" - it is because we should start count from the next character because our current character 
                // is the same
                startingIndexOfLongestSubstring = Math.max(startingIndexOfLongestSubstring, map.get(currentChar) + 1);

            // save position of the current character in the map. If map already has some value for current character 
            // then it will override (we don't want to know previous positions of the character)
            map.put(currentChar, i);
            // get maximum between 'max' (candidate for return value) and such value for current character
            max = Math.max(max, i - startingIndexOfLongestSubstring + 1);

        }//End of loop

        return max;

    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM