简体   繁体   English

避免或消除Python字典中的重复/反向结果

[英]Avoid or eliminate duplicated/reverse results in a dictionary in Python

enter image description here 在此处输入图片说明

I am looking for pairs in two columns in a csv. 我正在寻找csv中两列中的对。 If it finds AB and BA include just AB in the dictionary. 如果找到AB和BA,则在字典中仅包含AB。 I wrote the following script: 我写了以下脚本:

dataset = list(zip(col1,col2))
for a,b in dataset:
    if (a,b) and (b,a) in dataset:
        dic [a] = b 

But obviously I only need one and the output is: 但是显然我只需要一个,输出是:

{'A': 'B', 'C': 'B', 'B': 'A', 'D': 'C', 'F': 'C', 'H': 'C', 'J': 'X', 'X': 'J'}

As you see it is duplicated sometimes (for example, D:C is correct but it only appears once, I don't know why) 如您所见,它有时是重复的(例如,D:C是正确的,但它仅出现一次,我不知道为什么)

How can I avoid those duplicates or eliminate from the dictionary the "reverse forms"? 如何避免重复或从字典中消除“反向形式”?

Try a simple addition: 尝试一个简单的加法:

if (a,b) and (b,a) in dataset \
    and a < b :

This gets you only one of the two. 这只会使您成为两者之一。 It also assumes (as in your example) that there is no row containing the same value twice. 它还假定(如您的示例中)没有两次包含相同值的行。

You don't have to check both pairs in each step. 您不必在每个步骤中都检查两个对。 Instead you only need to check if b is in the dictionary and if so, whether dic[b] != a . 相反,您仅需要检查b是否在字典中,如果是,则检查dic[b] != a

The reason for this is that we are always adding from col1 as the key, so we only need to see if the reversed value has already been added. 原因是我们总是将col1作为键添加,因此我们只需要查看反转值是否已添加。

col1 = ['A', 'C', 'B', 'D', 'X', 'F', 'H', 'J']
col2 = ['B', 'B', 'A', 'C', 'J', 'C', 'C', 'X']

dic = {}
for a, b in zip(col1, col2):
    if (b not in dic) or (dic[b] != a):
        dic[a] = b

#{'A': 'B', 'C': 'B', 'D': 'C', 'F': 'C', 'H': 'C', 'X': 'J'}

However if you only wanted to keep one copy of each pair where both versions existed, you would need a slightly different approach. 但是,如果您只想在两个版本同时存在的情况下,每对都保留一份副本,则需要一种稍微不同的方法。

First create a dictionary with all of the common pairs. 首先创建一个包含所有常见对的字典。 Then iterate as before, and only add if both versions exist. 然后像以前一样进行迭代,并且只有在两个版本都存在时才添加。 Use the same logic as above to avoid duplicates. 使用与上述相同的逻辑以避免重复。

d = dict(zip(col1, col2))
dic = {}
for a, b in d.items():
    if (a in d) and (b in d) and (d[a] == b) and (d[b] == a):
        if (b not in dic) or (dic[b] != a):
            dic[a] = b

#{'A': 'B', 'J': 'X'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM