简体   繁体   中英

Why does this function work, where the for loop apparently fills the dict with values from the argument to allow a comparison?

Here is a practice interview question and the correct answer below it that fulfills the question. The problem is that I don't see how this function works. I explain my confusion below the answer.

Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.

Answer:

def firstDuplicate(a):
    oldies={}
    notfound=True
    for i in range(len(a)):
        try:
            if oldies[a[i]]==a[i]:
                notfound=False
                return a[i]     
        except:
            oldies[a[i]]=a[i]
    if notfound:
        return -1

So the function creates an empty dict, oldies. But then within the for loop, I don't understand how this line works:

if oldies[a[i]]==a[i]:

To me it appears that the == operator compares the indexed values of an empty dict, "oldies," to the argument that would be a list like this, for example:

a = [2, 4, 1, 3, 4, 5, 1, 5, 7, 8, 2, 4,]

But obviously "oldies" is not empty when this comparison is done. What's going on here?

I'm going to break down that answer and explain a number of the issues with it.

Explanation

Firstly, the function creates an empty dict and a boolean. Then it iterates through a range from 0 to n , which represents the length of the input list. From there it will try to compare the value at index i in the list with a value from the dictionary, which is initially empty. Since there is no proper key in the dictionary, it will throw an KeyError . This will be caught by the except statement and add to dictionary. If the first value in the input list is 'a' , then the dict will now look like {'a': 'a'} . Assuming that 'a' appears later in the list, it will eventually catch find and return that value. If it finds no duplicates, it returns -1 .

Issues

  1. It is not necessary to iterate over a range to iterate over a list. Iterating over the list directly will not require checking it's length or creating a range object, so it will likely be more performant.
  2. Creating the boolean at the beginning and using it at the end is redundant because any call to return will exit the function immediately. Therefore, if a duplicate is found the return in the if block will exit and nothing after the loop will be called.
  3. Using a dictionary is a bad choice of structure. There is more overhead space because it needs to maintain keys, values, and their relationships. Something like a set, or even a list would be a much better choice.
  4. Assuming we change oldies to be a set and are iterating over the list directly, the whole conditional block in that code could be reduced to a simple in statement. This also eliminates the final conditional, as mentioned above, by proper use of return.
  5. Even though I'm advising not to use it in this case, the except block should catch explicit exceptions instead of a general catchall. Something like except KeyError: should have been used.

The end result would look something like:

def firstDuplicate(a):
    oldies = set()

    for i in a:
        if i in oldies:
            return i
        else:
            oldies.add(i)
    return -1


print(firstDuplicate(['a', 'b', 'c', 'b', 'a', 'c']))

Result: b

There may be some even better solutions out there using itertools or some other package.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM