简体   繁体   中英

Largest substring of non-repeating letters of a string

From the beginning I want to point out, that I am using Python Language. In this question I initially have a string. For example 'abcagfhtgba'. I need to find the length of the largest substring of non-repeating letters. In the case provided above it is 'agfht' (5), because at position [4] the 'a' repeats, so we start the count from the begining. My idea for this question is to create a dictionary, which stores letters as keys, and numbers of their appearances as values. Whenever any key has corresponding value 2, we append the length of the dictionary to the list named result and completely substitute it with an empty list. For some tests this approach holds, for some not. I will provide the code that I used with brief comments of explanation.

Here I store the input in form of a list

this = list(map(str, input()))
def function(list):
    dict = {}
    count = 0
    result = [1]

Here I start the loop and for every element if it is not in the keys I create a key with value 1. If the element is in the dictionary I substitute the dict with the empty one. I don't forget to store the first repeating element in a new dictionary and do this. Another important point is at the end to append the count after the loop. Because the tail of the string (if it has the largest non-repeating sequence of letters) should be considered.

    for i in range(len(list)):
        if list[i] not in dict:
            dict[list[i]] = 1
            count += 1
        elif list[i] in dict:
            dict = {}
            dict[list[i]] = 1
            result.append(count)
            count = 1
    result.append(count)
    print(result)
    return max(result)

Here i make my function to choose choose the largest between the string and the inverse of it, to deal with the cases 'adabc', where the largest substring is at the end.

if len(this) != 0:
    print(max(function(this), function(this[::-1])))
else:
    print('')

I need help of people to tell me where in the approach to the problem I am wrong and edit my code.

Hopefully you might find this a little easier. The idea is to keep track of the seen substrings up to a given point in a set for a faster lookup, and if the current value is contained, build the set anew and append the substring seen up to that point. As you mention you have to check whether the last values have been added or not, hence the final if :

s = 'abcagfhtgba'

seen = set()
out = []
current_out = []
for i in s:
    if i not in seen:
        current_out += i
        seen.update(i)
    else:
        seen = set(i)
        out.append(''.join(current_out))
        current_out = [i]
if current_out:
    out.append(''.join(current_out))

max(out, key=len)
# 'agfht'

So some key differences:

  • Iterate over the string itself, not a range
  • Use sets rather than counts and dictionaries

Remember the last duplicate you have seen, maintain a map of letter to index. If you have already seen then this is duplicate, so we need to reset the index. But index can be this new one or just after the last duplicate character is seen.

s = 'abcagfhtgba'

seen = dict()
longest = ""
start = 0
last_duplicate = 0
for i, c in enumerate(s):
    if seen.has_key(c):
        if len(longest) < (i - start + 1):
            longest = s[start:i]
        new_start = seen.get(c) + 1
        if last_duplicate > new_start:
            start = i
        else:
            start = new_start
        last_duplicate = i
    seen[c] = I
if len(longest) < (len(s) - start + 1):
   longest = s[start:]
print longest

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM