Why won't my for loop work? (Python)

Question

Yes, this is homework. I'm just trying to understand why this doesn't seem to work.

I'm trying to find the longest substring in a string that's in alphabetical order. I make a list of random letters, and say the length is 19. When I run my code, it prints out indices 0 through 17. (I know this happens because I subtract 1 from the range) However, when I leave off that -1, it tells me the "string index is out of range." Why does that happen?

s = 'cntniymrmbhfinjttbiuqhib'
sub = ''
longest = []

for i in range(len(s) - 1):
    if s[i] <= s[i+1]:
        sub += s[i]
        longest.append(sub)
    elif s[i-1] <= s[i]:
        sub += s[i]
        longest.append(sub)
        sub = ' '
    else:
        sub = ' '
print(longest)
print ('Longest substring in alphabetical order is: ' + max(longest, key=len))

I've also tried a few other methods

If I just say:

for i in s:

it throws an error, saying "string indices must be integers, not str." This seems like a much simpler way to iterate through the string, but how would I compare individual letters this way?

This is Python 2.7 by the way.

Edit: I'm sure my if/elif statements could be improved but that's the first thing I could think of. I can come back to that later if need be.

Answer 1

The issue is the line if s[i] <= s[i+1]: . If i=18 (the final iteration of your loop without the -1 in it). Then i+1=19 is out of bounds.

Note that the line elif s[i-1] <= s[i]: is also probably not doing what you want it to. When i=0 we have i-1 = -1 . Python allows negative indices to mean counting from the back of the indexed object so s[-1] is the last character in the list (s[-2] would be the second last etc.).

A simpler way to get the previous and next character is to use zip whilst slicing the string to count from the first and second characters respectively.

zip works like this if you haven't seen it before:

>>> for char, x in zip(['a','b','c'], [1,2,3,4]):
>>>    print char, x
'a' 1
'b' 2
'c' 3

So you can just do:

for previous_char, char, next_char in zip(string, string[1:], string[2:]):

To iterate over all the triples of characters without messing up at the ends.

However there is a much simpler way to do this. Instead of comparing the current character in the string to other characters in the string you should compare it with the last character in the current string of alphabetised characters for example:

s = "abcdabcdefa"
longest = [s[0]]
current = [s[0]]
for char in s[1:]:
    if char >= current[-1]: # current[-1] == current[len(current)-1]
        current.append(char)
    else:            
        current=[char]
    if len(longest) < len(current):
        longest = current
print longest

This avoids having to do any fancy indexing.

Answer 2

I'm sure my if/elif statements could be improved but that's the first thing I could think of. I can come back to that later if need be.

@or1426's solution creates a list of the currently longest sorted sequence and copies it over to longest whenever a longer sequence is found. This creates a new list every time a longer sequence is found, and appends to a list for every character. This is actually very fast in Python, but see below.

@Deej's solution keeps the currently longest sorted sequence in a string variable, and every time a longer substring is found (even if it's a continuation of the current sequence) the substring is saved to a list. The list ends up having all sorted substrings of the original string, and the longest is found by using a call to max .

Here is a faster solution that only keeps track of the indices of the currently largest sequence, and only makes changes to longest when it finds a character that is not in sorted order:

def bjorn4(s):
    # we start out with s[0] being the longest sorted substring (LSS)
    longest = (0, 1)    # the slice-indices of the longest sorted substring
    longlen = 1         # the length of longest
    cur_start = 0       # the slice-indices of the *current* LSS
    cur_stop = 1

    for ch in s[1:]:       # skip the first ch since we handled it above
        end = cur_stop-1   # cur_stop is a slice index, subtract one to get the last ch in the LSS
        if ch >= s[end]:   # if ch >= then we're still in sorted order..
            cur_stop += 1  # just extend the current LSS by one
        else:
            # we found a ch that is not in sorted order
            if longlen < (cur_stop-cur_start):
                # if the current LSS is longer than longest, then..
                longest = (cur_start, cur_stop)    # store current in longest
                longlen = longest[1] - longest[0]  # precompute longlen

            # since we can't add ch to the current LSS we must create a new current around ch
            cur_start, cur_stop = cur_stop, cur_stop+1

    # if the LSS is at the end, then we'll not enter the else part above, so
    # check for it after the for loop
    if longlen < (cur_stop - cur_start):
        longest = (cur_start, cur_stop)

    return s[longest[0]:longest[1]]

How much faster? It's almost twice as fast as orl1426 and three times faster than deej. As always that depends on your input. The more chunks of sorted substrings that exist, the faster the above algorithm will be compared to the others. Eg on an input string of length 100000 containing alternating 100 random chars and 100 in-order chars, I get:

bjorn4: 2.4350001812
or1426: 3.84699988365
deej  : 7.13800001144

if I change it to alternating 1000 random chars and 1000 sorted chars, then I get:

bjorn4: 23.129999876
or1426: 38.8380000591
deej  : MemoryError

Update: Here is a further optimized version of my algorithm, with the comparison code:

import random, string
from itertools import izip_longest
import timeit

def _randstr(n):
    ls = []
    for i in range(n):
        ls.append(random.choice(string.lowercase))
    return ''.join(ls)

def _sortstr(n):
    return ''.join(sorted(_randstr(n)))

def badstr(nish):
    res = ""
    for i in range(nish):
        res += _sortstr(i)
        if len(res) >= nish:
            break
    return res

def achampion(s):
    start = end = longest = 0
    best = ""
    for c1, c2 in izip_longest(s, s[1:]):
        end += 1
        if c2 and c1 <= c2:
            continue
        if (end-start) > longest:
            longest = end - start
            best = s[start:end]
        start = end
    return best

def bjorn(s):
    cur_start = 0
    cur_stop = 1
    long_start = cur_start
    long_end = cur_stop

    for ch in s[1:]:      
        if ch < s[cur_stop-1]:
            if (long_end-long_start) < (cur_stop-cur_start):
                long_start = cur_start
                long_end = cur_stop
            cur_start = cur_stop
        cur_stop += 1

    if (long_end-long_start) < (cur_stop-cur_start):
        return s[cur_start:cur_stop]
    return s[long_start:long_end]


def or1426(s):
    longest = [s[0]]
    current = [s[0]]
    for char in s[1:]:
        if char >= current[-1]: # current[-1] == current[len(current)-1]
            current.append(char)
        else:            
            current=[char]
        if len(longest) < len(current):
            longest = current
    return ''.join(longest)

if __name__ == "__main__":
    print 'achampion:', round(min(timeit.Timer(
        "achampion(rstr)",
        setup="gc.enable();from __main__ import achampion, badstr; rstr=badstr(30000)"
    ).repeat(15, 50)), 3)

    print 'bjorn:', round(min(timeit.Timer(
        "bjorn(rstr)",
        setup="gc.enable();from __main__ import bjorn, badstr; rstr=badstr(30000)"
    ).repeat(15, 50)), 3)

    print 'or1426:', round(min(timeit.Timer(
        "or1426(rstr)",
        setup="gc.enable();from __main__ import or1426, badstr; rstr=badstr(30000)"
    ).repeat(15, 50)), 3)

With output:

achampion: 0.274
bjorn: 0.253
or1426: 0.486

changing the data to be random:

achampion: 0.350
bjorn: 0.337
or1426: 0.565

and sorted:

achampion: 0.262
bjorn: 0.245
or1426: 0.503

"no, no, it's not dead, it's resting"

Answer 3

Now Deej has an answer I feel more comfortable posting answers to homework.
Just reordering @Deej's logic a little you can simplify to:

sub = ''
longest = []
for i in range(len(s)-1):  # -1 simplifies the if condition
    sub += s[i]
    if s[i] <= s[i+1]:
        continue           # Keep adding to sub until condition fails
    longest.append(sub)    # Only add to longest when condition fails
    sub = ''

max(longest, key=len)

But as mentioned by @thebjorn this has the issue of keeping every ascending partition in a list (in memory). You could fix this by using a generator, and I only put the rest here for instructional purposes:

def alpha_partition(s):
    sub = ''
    for i in range(len(s)-1):
        sub += s[i]
        if s[i] <= s[i+1]:
            continue
        yield sub
        sub = ''

max(alpha_partition(s), key=len)

This certainly wont be the fastest solution (string construction and indexing) but it's quite simple to change, use zip to avoid the indexing into the string and indexes to avoid string construction and addition:

from itertools import izip_longest   # For py3.X use zip_longest
def alpha_partition(s):
    start = end = 0
    for c1, c2 in izip_longest(s, s[1:]):
        end += 1
        if c2 and c1 <= c2:
            continue
        yield s[start:end]
        start = end

max(alpha_partition(s), key=len)

Which should operate pretty efficiently and be only slightly slower than the iterative indexing approach from @thebjorn due to the generator overhead.

Using s*100
alpha_partition() : 1000 loops, best of 3: 448 µs per loop
@thebjorn: 1000 loops, best of 3: 389 µs per loop

For reference turning the generator into an iterative function:

from itertools import izip_longest   # For py3.X use zip_longest
def best_alpha_partition(s):
    start = end = longest = 0
    best = ""
    for c1, c2 in izip_longest(s, s[1:]):
        end += 1
        if c2 and c1 <= c2:
            continue
        if (end-start) > longest:
            longest = end - start
            best = s[start:end]
        start = end
    return best
best_alpha_partition(s)

best_alpha_partition() : 1000 loops, best of 3: 306 µs per loop

I personally prefer the generator form because you would use exactly the same generator for finding the minimum, the top 5, etc. very reusable vs. the iterative function which only does one thing.

Answer 4

ok, so after reading your responses and trying all kinds of different things, I finally came up with a solution that gets exactly what I need. It's not the prettiest code, but it works. I'm sure the solutions mentioned would work as well, however I couldn't figure them out. Here's what I did:

s = 'inaciaebganawfiaefc'
sub = ''
longest = []
for i in range(len(s)):
    if (i+1) < len(s) and s[i] <= s[i+1]:
        sub += s[i]
        longest.append(sub)
    elif i >= 0 and s[i-1] <= s[i]:
        sub += s[i]
        longest.append(sub)
        sub = ''
    else:
        sub = ''
print ('Longest substring in alphabetical order is: ' + max(longest, key=len))

Why won't my for loop work? (Python)

Question

4 answers

solution1
2 2015-09-03 14:11:53

solution2
1 ACCPTED 2015-09-03 21:56:47

solution3
1 2015-09-04 03:24:32

solution4
0 2015-09-03 16:16:38

Why won't my for loop work? (Python)

Question

4 answers

solution1 2 2015-09-03 14:11:53

solution2 1 ACCPTED 2015-09-03 21:56:47

solution3 1 2015-09-04 03:24:32

solution4 0 2015-09-03 16:16:38

solution1
2 2015-09-03 14:11:53

solution2
1 ACCPTED 2015-09-03 21:56:47

solution3
1 2015-09-04 03:24:32

solution4
0 2015-09-03 16:16:38