Why does this piece of code loop infinitely?

Question

I am in the process of writing a search engine for the experience and the knowledge. Right now, I am in the process of building a crawler and its accompanying utilities. One of these is the URL normalizer. This is what I am trying to build right now, and more specifically I am stuck at the point where I have to make a method to take a url, and capitalize letters that follow a '%' sign. My code so far:

def escape_sequence_capitalization(url):
        ''' The method that capitalizes letters in escape sequences.
        All letters within a percent - encoding triplet (e.g. '%2C') are case
        insensitive and should be capitalized.

        '''
    next_encounter = None
    url_list = []
    while True:
        next_encounter = url.find('%')
        if next_encounter == -1:
            break

        for letter in url[:next_encounter]:
            url_list.append(letter)

        new_character = url[next_encounter + 1].upper()
        url_list.append(new_character)
        url = url[next_encounter:]

    for letter in url:
        url_list.append(letter)

    return ''.join(url_list)

Can someone please guide me to where my error is? I would be grateful. Thank you.

EDIT: this is what I am trying to achieve:

http://www.example.com/a%c2%b1b → http://www.example.com/a%C2%B1b

Answer 1

By static analysis, it loops forever because your while True never breaks. So where can it break? Only at the break statement only if the next_encounter becomes equal to -1; so you can deduce that it never does.

Why doesn't it? Try a print next_encounter after url.find . You'll quickly see that

url = url[next_encounter:]

does almost what you hope it will, only it gives you one character more than you hoped.

Why did I present it this way? Mostly because the value of print is often underrated by people learning the language.

Answer 2

@msw nailed it and gave sound advice.

My $.02 is you never should have tried this loop

How about:

>>> re.sub('%..',lambda m: m.group(0).upper(),'http://www.example.com/a%c2%b1b')
'http://www.example.com/a%C2%B1b'

Answer 3

This is why:

>>> 'asd'.find('s')
1
>>> 'asd'[1:]
'sd'

Also, consider using the second argument to str.find() instead of slicing.

Answer 4

I'm coming a bit late to the party, but you might want to consider using a regular expression instead of such a complicated function:

>>> import re
>>> url = "http://www.example.com/a%c2%b1b"
>>> result = re.sub("(?i)%[0-9A-F]{2}", lambda x: x.group(0).upper(), url)
>>> result
'http://www.example.com/a%C2%B1b'

Explanation:

(?i)          # Make regex case-insensitive
%             # Match a %
[0-9A-F]{2}   # Match two hex digits

re.sub() finds all these occurrences in the string and passes the result (the match object's group(0) ) to the .upper() method, then replaces the original with the uppercased version of the match.

Why does this piece of code loop infinitely?

Question

4 answers

solution1
10 ACCPTED 2012-07-14 18:22:27

solution2
4 2012-07-14 18:34:31

solution3
3 2012-07-14 18:21:34

solution4
1 2012-07-15 07:27:17

Why does this piece of code loop infinitely?

Question

4 answers

solution1 10 ACCPTED 2012-07-14 18:22:27

solution2 4 2012-07-14 18:34:31

solution3 3 2012-07-14 18:21:34

solution4 1 2012-07-15 07:27:17

solution1
10 ACCPTED 2012-07-14 18:22:27

solution2
4 2012-07-14 18:34:31

solution3
3 2012-07-14 18:21:34

solution4
1 2012-07-15 07:27:17