简体   繁体   English

字符串中不重复字母的最大 substring

[英]Largest substring of non-repeating letters of a string

From the beginning I want to point out, that I am using Python Language.从一开始我想指出,我使用的是 Python 语言。 In this question I initially have a string.在这个问题中,我最初有一个字符串。 For example 'abcagfhtgba'.例如'abcagfhtgba'。 I need to find the length of the largest substring of non-repeating letters.我需要找到不重复字母的最大 substring 的长度。 In the case provided above it is 'agfht' (5), because at position [4] the 'a' repeats, so we start the count from the begining.在上面提供的情况下,它是 'agfht' (5),因为在 position [4] 处,'a' 重复,所以我们从头开始计数。 My idea for this question is to create a dictionary, which stores letters as keys, and numbers of their appearances as values.我对这个问题的想法是创建一个字典,它将字母存储为键,并将它们的外观数字存储为值。 Whenever any key has corresponding value 2, we append the length of the dictionary to the list named result and completely substitute it with an empty list.每当任何键具有对应的值 2 时,我们将字典的长度 append 到名为 result 的列表中,并将其完全替换为空列表。 For some tests this approach holds, for some not.对于某些测试,这种方法适用,而对于某些测试则不适用。 I will provide the code that I used with brief comments of explanation.我将提供我使用的代码以及简短的解释注释。

Here I store the input in form of a list在这里,我以列表的形式存储输入

this = list(map(str, input()))
def function(list):
    dict = {}
    count = 0
    result = [1]

Here I start the loop and for every element if it is not in the keys I create a key with value 1. If the element is in the dictionary I substitute the dict with the empty one.在这里我开始循环,如果每个元素不在键中,我创建一个值为 1 的键。如果元素在字典中,我用空的替换字典。 I don't forget to store the first repeating element in a new dictionary and do this.我不会忘记将第一个重复元素存储在新字典中并执行此操作。 Another important point is at the end to append the count after the loop.另一个重要的一点是在 append 结束后循环计数。 Because the tail of the string (if it has the largest non-repeating sequence of letters) should be considered.因为应该考虑字符串的尾部(如果它具有最大的非重复字母序列)。

    for i in range(len(list)):
        if list[i] not in dict:
            dict[list[i]] = 1
            count += 1
        elif list[i] in dict:
            dict = {}
            dict[list[i]] = 1
            result.append(count)
            count = 1
    result.append(count)
    print(result)
    return max(result)

Here i make my function to choose choose the largest between the string and the inverse of it, to deal with the cases 'adabc', where the largest substring is at the end.在这里,我让我的 function 选择在字符串和它的倒数之间选择最大的,以处理“adabc”的情况,其中最大的 substring 位于末尾。

if len(this) != 0:
    print(max(function(this), function(this[::-1])))
else:
    print('')

I need help of people to tell me where in the approach to the problem I am wrong and edit my code.我需要人们的帮助来告诉我解决问题的方法在哪里我错了并编辑我的代码。

Hopefully you might find this a little easier.希望你会发现这更容易一些。 The idea is to keep track of the seen substrings up to a given point in a set for a faster lookup, and if the current value is contained, build the set anew and append the substring seen up to that point.这个想法是跟踪看到的子串直到set中的给定点,以便更快地查找,如果包含当前值,则重新构建集合,append substring 看到该点。 As you mention you have to check whether the last values have been added or not, hence the final if :正如您提到的,您必须检查是否添加了最后一个值,因此最终的if

s = 'abcagfhtgba'

seen = set()
out = []
current_out = []
for i in s:
    if i not in seen:
        current_out += i
        seen.update(i)
    else:
        seen = set(i)
        out.append(''.join(current_out))
        current_out = [i]
if current_out:
    out.append(''.join(current_out))

max(out, key=len)
# 'agfht'

So some key differences:所以一些关键的区别:

  • Iterate over the string itself, not a range迭代字符串本身,而不是范围
  • Use sets rather than counts and dictionaries使用集合而不是计数和字典

Remember the last duplicate you have seen, maintain a map of letter to index.记住你看到的最后一个副本,维护一个 map 的字母索引。 If you have already seen then this is duplicate, so we need to reset the index.如果您已经看到,那么这是重复的,因此我们需要重置索引。 But index can be this new one or just after the last duplicate character is seen.但是索引可以是这个新的,也可以是在看到最后一个重复字符之后。

s = 'abcagfhtgba'

seen = dict()
longest = ""
start = 0
last_duplicate = 0
for i, c in enumerate(s):
    if seen.has_key(c):
        if len(longest) < (i - start + 1):
            longest = s[start:i]
        new_start = seen.get(c) + 1
        if last_duplicate > new_start:
            start = i
        else:
            start = new_start
        last_duplicate = i
    seen[c] = I
if len(longest) < (len(s) - start + 1):
   longest = s[start:]
print longest

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM