简体   繁体   中英

Python 2.7 - min built-in function not working as expected

I'm making Google Python exercises and don't understand the behaviour of min() built-in function, which seems not to produce the expected result. The exercise is "babynames", and I'm testing the code with 'baby1990.html' file ( https://developers.google.com/edu/python/exercises/baby-names )

def extract_names(filename):
    f = open(filename, 'r').read()
    res = []
    d = {}
    match = re.search(r'<h3(.*?)in (\d+)</h3>', f)
    if match:
            res.append(match.group(2))

    vals = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f)
    for n, m, f in vals:
            if m=='Adrian' or f=='Adrian':
                    if m not in d:
                            d[m] = n
                    else:
                            d[m] = min(n, d[m])

                    if f not in d:       
                            d[f] = n
                    else:
                            print "min( "+str(n)+", "+str(d[f])+") = "+str( min(n, d[f]) ) 
                            d[f] = min( [n, d[f]] )

    for name,rank in sorted(d.items()):
    res.append(name+" "+str(rank))

    return res

vals is a list of tuples (rank, male_name, female_name) and I want to store each name (male and female) in the dictionary 'd' with name as key and rank as value. If there's a duplicate, i want to keep the lower rank value.

I noticed that the name 'Adrian' appears two times in the collection, the first time as male name with rank 94 and the second time as female with rank 603, and i want the smaller of the two values.

So, the first time 'Adrian' is matched, it's stored in the dictionary with rank 94 (correctly). When it's matched the second time, the execution flow correctly enters the second branch of the second if, but the result becames 603, even if min(94, 603) = 94. So the result is:

min( 603, 94) = 603
1990
Adrian 603
Anton 603
Ariel 94

I don't understand where the bug is. Via interpreter, min(94, 603) = 94, as expected. What am I missing?

Thanks for help

PS: I also tried min( n, d[f] ) that is the same function without list, but the result is always 603

You are comparing strings, not numbers:

>>> min('603', '94')
'603'

Lexographically, '6' sorts before '9' . Regular expressions work on strings, returned matches are strings even when digits are matched. Use int() to turn your strings into integers:

vals = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f)
for n, m, f in vals:
    n = int(n)
    # ...

When trying to debug Python code, use repr() instead of str() to detect type problems; had you used repr() you would have seen that '94' would be printed instead of 94 (so with quotes to denote a string).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM