简体   繁体   English

具有非重复字符的最长子串

[英]Longest Substring with Non-Repeating Character

I am trying to solve the problem of finding the longest substring without a repeating character from a given string. 我正在尝试解决在没有给定字符串重复字符的情况下找到最长子字符串的问题。

class Solution(object):
    def lengthOfLongestSubstring(self, s):
        """
        :type s: str
        :rtype: int
        """
        start = 0
        mlen = -1
        cur_ = {}
        cur = 0

        while(start<len(s) and cur<len(s)):
            if s[cur] in cur_ and cur_[s[cur]] >= start:
                if cur - start > mlen:
                    mlen = cur - start
                start = cur_[s[cur]] + 1    
                cur_[s[cur]] = cur            
            else:
                cur_[s[cur]] = cur
                if cur - start > mlen:
                    mlen = cur - start
            cur = cur + 1            
        return mlen

x = Solution()
print(x.lengthOfLongestSubstring("abababcdef"))

I think I am solving it correctly: 我想我正在正确解决它:

Start a new substring when you encounter a repeated character. 遇到重复的字符时,请开始一个新的子字符串。

But I am not getting the answers right? 但是我没有得到正确的答案吗?

In the above example the output is 5 whereas the correct answer is 6. 在上面的示例中,输出为5,而正确答案为6。

But for this case: 但是对于这种情况:

print(x.lengthOfLongestSubstring("ababa")) 打印(x.lengthOfLongestSubstring(“ ababa”))

the output is correct ie 2. 输出正确,即2。

Not sure why am I failing that case? 不知道为什么我无法通过该案件? Thanks. 谢谢。

I've changed your function a bit to return the longest substring of unique characters instead of just length of it. 我对您的函数进行了一些更改,以返回唯一字符的最长子字符串,而不仅仅是它的长度。 If you want length - you can always get that from string. 如果您想要长度-您总是可以从字符串中获取长度。

def get_longest_substring(input):
    """
    :type input: str
    :rtype: str
    """

    current = []
    all_substrings = []
    for c in input:
        if c in current:
            all_substrings.append(''.join(current))
            cut_off = current.index(c) + 1
            current = current[cut_off:]
        current += c
    all_substrings.append(''.join(current))

    longest = max(all_substrings, key=len)
    return longest

longest = get_longest_substring("abababcdefc")
print(longest, len(longest))

Code goes through each char building a char array. 代码通过每个char构建一个char数组。

If it finds a char already in the array it keeps a copy of the array, cuts off beginning of it up to that character and keeps building it. 如果它在数组中已经找到一个char,则保留该数组的副本,将其开头剪切到该字符并继续构建。

At the end it picks longest substring it found and returns it. 最后,它选择找到的最长子字符串并将其返回。

You are updating mlen incorrectly in the else branch, you forgot to add current character. 您在else分支中错误地更新了mlen ,忘记添加当前字符。 Also, you don't need to update mlen when you meet a repetition: 另外,遇到重复时,您不需要更新mlen

if s[cur] in cur_ and cur_[s[cur]] >= start:
    start = cur_[s[cur]] + 1    
else:
    mlen = max(mlen, cur - start + 1)

cur_[s[cur]] = cur
cur = cur + 1             

I can suggest you this simple algorithm : 我可以建议你这个简单的算法:

1. set all variables to empty. 1.将所有变量设置为空。

2. for each letter ch in the string : 2.对于字符串中的每个字母ch:

2.1. 2.1。 check if ch exists in the dict of the currrent found sub string ? 检查在当前子字符串的字典中是否存在ch

if it does - check if the cur' sub string longer then the max (maxSubStr initialyzed to "") ?, does - seve the cur' sub string in the max. 如果是-检查cur'子字符串是否长于max(maxSubStr初始化为“”)?,是否-将cur'子字符串设置为max。 set the dict of the currrent found sub string with the value ch , and set the current sub string to ch . 将当前找到的子字符串的dict设置为ch ,并将当前子字符串的设置为ch

if it doen't - add ch to the dict of the currrent found sub string. 如果不是,则将 ch添加到当前找到的子字符串的字典中。 and concat the cur' substring with ch . 并用ch连接 cur'子字符串。

3. return the length of the longest of the current substring and the max. 3.返回当前子串最长的长度和最大的长度。

class Solution(object):

def lengthOfLongestSubstring(self, s):
    """
    :type s: str
    :rtype: int
    """

    curSubStr = ""
    curSubStrDict = {}
    maxSubStr = ""
    for ch in s :
        if ch in curSubStrDict :
            if len(maxSubStr) < len(curSubStr):
                maxSubStr = curSubStr
            curSubStrDict = {}
            curSubStrDict[ch] = ch
            curSubStr = ""+ch

        else :
            curSubStrDict[ch] = ch
            curSubStr += ch

    return len(curSubStr) if len(curSubStr) > len(maxSubStr) else len(maxSubStr)

x = Solution()
print(x.lengthOfLongestSubstring("abcaabccdefgfgh")) # 5 = |cdefg|
print(x.lengthOfLongestSubstring("abababcdef")) # 6 = |abcdef|

just like finding max element in an array, we "iterate over" (not actually iterate) the -substrings without repeating char- and saving the longest. 就像在数组中查找max元素一样,我们“遍历”(实际上不是迭代)-substring,而无需重复char-并保存最长的时间。

the iteration happens when we detect a char that contained in the current substring. 当我们检测到当前子字符串中包含的char时,就会发生迭代。 than we iterate to the next substring. 而不是我们迭代到下一个子字符串。

While you can keep track of your position in the string by incrementing markers at each iteration, it would be simpler to utilize nested for-loops: 尽管您可以通过在每次迭代中增加标记来跟踪字符串中的位置,但使用嵌套的for循环会更简单:

s = "abababcdef"
long_run = max({s[i:b] for i in range(len(s)) for b in range(len(s)) if len(set(s[i:b])) == len(s[i:b])}, key=len)

Using set comprehension , the code first finds all possible substrings utilizing a nested for loop. 使用set comprehension ,代码首先使用嵌套的for循环查找所有可能的子字符串。 Each for-loop generates all the values in the range of the length of the string, thus removing the need to mutate upper and lower bounds. 每个for循环都会生成字符串长度范围内的所有值,从而消除了更改上限和下限的需要。 The comprehension filters all substrings that contain at least one repeating value. 理解会过滤所有包含至少一个重复值的子字符串。 To do this, the code uses the set function to remove all duplicates, thus enabling a check for equal lengths of the substring caste to a set and the length of the original substring. 为此,代码使用set函数删除所有重复项,从而可以检查子字符串等级与集合的长度和原始子字符串的长度是否相等。 Lastly, the max function is applied to the set to find the longest substring in the set via a key of len . 最后,将max函数应用于集合,以通过键len在集合中找到最长的子字符串。

Output: 输出:

'abcde'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM