简体   繁体   中英

Avoid or eliminate duplicated/reverse results in a dictionary in Python

enter image description here

I am looking for pairs in two columns in a csv. If it finds AB and BA include just AB in the dictionary. I wrote the following script:

dataset = list(zip(col1,col2))
for a,b in dataset:
    if (a,b) and (b,a) in dataset:
        dic [a] = b 

But obviously I only need one and the output is:

{'A': 'B', 'C': 'B', 'B': 'A', 'D': 'C', 'F': 'C', 'H': 'C', 'J': 'X', 'X': 'J'}

As you see it is duplicated sometimes (for example, D:C is correct but it only appears once, I don't know why)

How can I avoid those duplicates or eliminate from the dictionary the "reverse forms"?

Try a simple addition:

if (a,b) and (b,a) in dataset \
    and a < b :

This gets you only one of the two. It also assumes (as in your example) that there is no row containing the same value twice.

You don't have to check both pairs in each step. Instead you only need to check if b is in the dictionary and if so, whether dic[b] != a .

The reason for this is that we are always adding from col1 as the key, so we only need to see if the reversed value has already been added.

col1 = ['A', 'C', 'B', 'D', 'X', 'F', 'H', 'J']
col2 = ['B', 'B', 'A', 'C', 'J', 'C', 'C', 'X']

dic = {}
for a, b in zip(col1, col2):
    if (b not in dic) or (dic[b] != a):
        dic[a] = b

#{'A': 'B', 'C': 'B', 'D': 'C', 'F': 'C', 'H': 'C', 'X': 'J'}

However if you only wanted to keep one copy of each pair where both versions existed, you would need a slightly different approach.

First create a dictionary with all of the common pairs. Then iterate as before, and only add if both versions exist. Use the same logic as above to avoid duplicates.

d = dict(zip(col1, col2))
dic = {}
for a, b in d.items():
    if (a in d) and (b in d) and (d[a] == b) and (d[b] == a):
        if (b not in dic) or (dic[b] != a):
            dic[a] = b

#{'A': 'B', 'J': 'X'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM