简体   繁体   中英

Avoid inserting duplicates into Python list with comprehension

I have a dictionary:

XY_dict = {1: [(12, 55),(13, 55)],
2: [(14, 55),(15, 57)],
3: [(14, 55),(15, 58)],
4: [(14, 55),(16, 55)]}

I want to find out which keys have values tuples of which are unique (don't present in any other key's value). From the sample dictionary, key 1 is unique because neither (12, 55) nor (13, 55) is present in any other dictionary's key. By getting the list of keys with shared values, I can invert the result later on and get the keys that are unique.

I am using a list comprehension for getting keys with shared values:

keys_shared_values = [k1 for k1,v1 in XY_dict.iteritems()
                       for k,v in XY_dict.iteritems()
                       for XY_pair in v
                       if XY_pair in v1 and k != k1 and k1 not in keys_shared_values]

As a result, I am getting [2, 2, 3, 3, 4, 4] yet I expect duplicates not to be inserted (since I am evaluating whether the key value is in the result list). I can fix that by running the list(set(shared_values)) , but would like to understand what is wrong with my code.

Others have already explained what's the problem with your list comprehension. Here's an alternative approach, using a Counter dictionary to count how often the different xy pairs occur and using that to filter the unique entries from the dictionary.

>>> from collections import Counter
>>> c = Counter(xy for v in XY_dict.values() for xy in v)
>>> {k: v for k, v in XY_dict.iteritems() if all(c[xy] == 1 for xy in v)}
{1: [(12, 55), (13, 55)]}

Or to get the keys with shared values:

>>> [k for k, v in XY_dict.iteritems() if any(c[xy] > 1 for xy in v)]
[2, 3, 4]

Note that this is also more efficient, as you compare each combination of two items from the dictionary, giving you quadratic complexity, while this approach has linear complexity.

The problem is that keys_shared_values is empty until you complete the comprehension, so your k1 not in keys_shared_values will always return True . You cannot refer to the current comprehension. Your best bet is to convert to set as you already suggested.

You should change your code to a loop if you want that functionality:

keys_shared_values = []
for k, v in XY_dict.iteritems():
    for k1, v1 in XY_dict.iteritems():
        for XY_pair in v:
            if XY_pair in v1 and k != k1 and k1 not in keys_shared_values:
                keys_shared_values.append(k1)
print keys_shared_values

result:

[3, 4, 2]

Your code cannot work because key_shared_values is not defined. If you clean up your environment you will see that if you try to run your example you will get a NameError: name 'key_shared_values' is not defined error.

This is because keys_shared_values is not really defined until the comprehension statement runs, you cannot really reference it within the comprehension because it doesn't already exist.

If you were to predefine it, for example as keys_shared_values = [] then this would still not work, because every time you would reference it in the comprehension it would reference the original empty list value. When the comprehension is executed it doesn't dynamically change the value of keys_shared_values , instead it creates the list in memory and then assigns it to keys_shared_values .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM