简体   繁体   English

从字典中删除重复项

[英]Removing Duplicates From Dictionary

I have the following Python 2.7 dictionary data structure (I do not control source data - comes from another system as is):我有以下 Python 2.7 字典数据结构(我不控制源数据 - 来自另一个系统):

{112762853378: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112762853385: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112760496444: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4']
   },
 112760496502: 
   {'dst': ['10.122.195.34'], 
    'src': ['4.3.2.1']
   },
 112765083670: ...
}

The dictionary keys will always be unique.字典键将始终是唯一的。 Dst, src, and alias can be duplicates. Dst、src 和别名可以是重复的。 All records will always have a dst and src but not every record will necessarily have an alias as seen in the third record.所有记录都将始终具有 dst 和 src,但并非每个记录都必须具有别名,如第三个记录中所示。

In the sample data either of the first two records would be removed (doesn't matter to me which one).在示例数据中,前两条记录中的任何一条都将被删除(对我来说哪个无关紧要)。 The third record would be considered unique since although dst and src are the same it is missing alias.第三条记录将被认为是唯一的,因为尽管 dst 和 src 是相同的,但它缺少别名。

My goal is to remove all records where the dst, src, and alias have all been duplicated - regardless of the key.我的目标是删除所有 dst、src 和别名都已复制的记录 - 无论密钥如何。

How does this rookie accomplish this?这个菜鸟是怎么做到的?

Also, my limited understanding of Python interprets the data structure as a dictionary with the values stored in dictionaries... a dict of dicts, is this correct?另外,我对 Python 的有限理解将数据结构解释为一个字典,其中的值存储在字典中......一个字典的字典,这是正确的吗?

You could go though each of the items (the key value pair) in the dictionary and add them into a result dictionary if the value was not already in the result dictionary.如果值不在结果字典中,您可以查看字典中的每个项目(键值对)并将它们添加到结果字典中。

input_raw = {112762853378: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112762853385: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112760496444: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4']
   },
 112760496502: 
   {'dst': ['10.122.195.34'], 
    'src': ['4.3.2.1']
   }
}

result = {}

for key,value in input_raw.items():
    if value not in result.values():
        result[key] = value

print result

One simple approach would be to create a reverse dictionary using the concatenation of the string data in each inner dictionary as a key.一种简单的方法是使用每个内部字典中字符串数据的串联作为键来创建一个反向字典。 So say you have the above data in a dictionary, d :因此,假设您在字典d有上述数据:

>>> import collections
>>> reverse_d = collections.defaultdict(list)
>>> for key, inner_d in d.iteritems():
...     key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d)
...     reverse_d[key_str].append(key)
... 
>>> duplicates = [keys for key_str, keys in reverse_d.iteritems() if len(keys) > 1]
>>> duplicates
[[112762853385, 112762853378]]

If you don't want a list of duplicates or anything like that, but just want to create a duplicate-less dict, you could just use a regular dictionary instead of a defaultdict and re-reverse it like so:如果您不想要重复的列表或类似的东西,而只想创建一个无重复的字典,您可以只使用常规字典而不是defaultdict并像这样重新反转它:

>>> for key, inner_d in d.iteritems():
...     key_str = ''.join(inner_d[k][0] for k in ['dst', 'src', 'alias'] if k in inner_d)
...     reverse_d[key_str] = key
>>> new_d = dict((val, d[val]) for val in reverse_d.itervalues())
input_raw = {112762853378:  {'dst': ['10.121.4.136'],
                             'src': ['1.2.3.4'],
                             'alias': ['www.example.com']    },
             112762853385:  {'dst': ['10.121.4.136'],
                             'src': ['1.2.3.4'],
                             'alias': ['www.example.com']    },
             112760496444:  {'dst': ['10.121.4.299'],
                             'src': ['1.2.3.4']    },
             112760496502:  {'dst': ['10.122.195.34'],
                             'src': ['4.3.2.1']    },
             112758601487:  {'src': ['1.2.3.4'],
                             'alias': ['www.example.com'],
                             'dst': ['10.121.4.136']},
             112757412898:  {'dst': ['10.122.195.34'],
                             'src': ['4.3.2.1']    },
             112757354733:  {'dst': ['124.12.13.14'],
                             'src': ['8.5.6.0']},             
             }

for x in input_raw.iteritems():
    print x
print '\n---------------------------\n'

seen = []

for k,val in input_raw.items():
    if val in seen:
        del input_raw[k]
    else:
        seen.append(val)


for x in input_raw.iteritems():
    print x

result结果

(112762853385L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
(112757354733L, {'src': ['8.5.6.0'], 'dst': ['124.12.13.14']})
(112758601487L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
(112757412898L, {'src': ['4.3.2.1'], 'dst': ['10.122.195.34']})
(112760496502L, {'src': ['4.3.2.1'], 'dst': ['10.122.195.34']})
(112760496444L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.299']})
(112762853378L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})

---------------------------

(112762853385L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.136'], 'alias': ['www.example.com']})
(112757354733L, {'src': ['8.5.6.0'], 'dst': ['124.12.13.14']})
(112757412898L, {'src': ['4.3.2.1'], 'dst': ['10.122.195.34']})
(112760496444L, {'src': ['1.2.3.4'], 'dst': ['10.121.4.299']})

The facts that this solution creates first a list input_raw.iteritems() (as in Andrew's Cox's answer) and requires a growing list seen are drawbacks.该解决方案首先创建一个列表input_raw.iteritems()(如安德鲁·考克斯的答案),并要求看到了越来越多的事实是缺点。
But the first can't be avoided (using iteritems() doesn't work) and the second is less heavy than re-creating a list result.values() from growing list result for each turn of a loop.但第一个无法避免(使用 iteritems() 不起作用),第二个比从循环的每一轮增长的列表结果重新创建列表result.values() 更轻

Another reverse dict variation:另一个反向 dict 变体:

>>> import pprint
>>> 
>>> data = {
...   112762853378: 
...    {'dst': ['10.121.4.136'], 
...     'src': ['1.2.3.4'], 
...     'alias': ['www.example.com']
...    },
...  112762853385: 
...    {'dst': ['10.121.4.136'], 
...     'src': ['1.2.3.4'], 
...     'alias': ['www.example.com']
...    },
...  112760496444: 
...    {'dst': ['10.121.4.136'], 
...     'src': ['1.2.3.4']
...    },
...  112760496502: 
...    {'dst': ['10.122.195.34'], 
...     'src': ['4.3.2.1']
...    },
... }
>>> 
>>> keep = set({repr(sorted(value.items())):key
...             for key,value in data.iteritems()}.values())
>>> 
>>> for key in data.keys():
...     if key not in keep:
...         del data[key]
... 
>>> 
>>> pprint.pprint(data)
{112760496444L: {'dst': ['10.121.4.136'], 'src': ['1.2.3.4']},
 112760496502L: {'dst': ['10.122.195.34'], 'src': ['4.3.2.1']},
 112762853378L: {'alias': ['www.example.com'],
                 'dst': ['10.121.4.136'],
                 'src': ['1.2.3.4']}}

Since the way to find uniqueness in correspondences is exactly to use a dictionary, with the desired unique value being the key, the way to go is to create a reversed dict, where your values are composed as the key - then recreate a "de-reversed" dictionary using the intermediate result.由于在对应关系中找到唯一性的方法正是使用字典,以所需的唯一值作为键,所以要走的路是创建一个反向字典,其中您的值作为键组成 - 然后重新创建一个“de-使用中间结果反转”字典。

dct = {112762853378: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112762853385: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112760496444: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4']
   },
 112760496502: 
   {'dst': ['10.122.195.34'], 
    'src': ['4.3.2.1']
   },
   }

def remove_dups (dct):
    reversed_dct = {}
    for key, val in dct.items():
        new_key = tuple(val["dst"]) + tuple(val["src"]) + (tuple(val["alias"]) if "alias" in val else (None,) ) 
        reversed_dct[new_key] = key
    result_dct = {}
    for key, val in reversed_dct.items():
        result_dct[val] = dct[val]
    return result_dct

result = remove_dups(dct)
dups={}

for key,val in dct.iteritems():
    if val.get('alias') != None:
        ref = "%s%s%s" % (val['dst'] , val['src'] ,val['alias'])# a simple hash
        dups.setdefault(ref,[]) 
        dups[ref].append(key)

for k,v in dups.iteritems():
    if len(v) > 1:
        for key in v:
            del dct[key]

I solved it using compressed dictionary method:我使用压缩字典方法解决了它:

dic = {112762853378: 
    {'dst': ['10.121.4.136'], 
     'src': ['1.2.3.4'], 
     'alias': ['www.example.com']
    },
112762853385: 
    {'dst': ['10.121.4.136'], 
     'src': ['1.2.3.4'], 
     'alias': ['www.example.com']
    },
112760496444: 
    {'dst': ['10.121.4.136'], 
     'src': ['1.2.3.4']
    },
112760496502: 
    {'dst': ['10.122.195.34'], 
     'src': ['4.3.2.1']
    }
}

result = {k:v for k,v in dic.items() if list(dic.values()).count(v)==1}
from collections import defaultdict

dups = defaultdict(lambda : defaultdict(list))

for key, entry in data.iteritems():
    dups[tuple(entry.keys())][tuple([v[0] for v in entry.values()])].append(key)

for dup_indexes in dups.values():
    for keys in dup_indexes.values():
        for key in keys[1:]:
            if key in data:
                del data[key]

I would just make a set of the list of keys then iterate over them into a new dict:我只会制作一组键列表,然后将它们迭代到一个新的字典中:

input_raw = {112762853378: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112762853385: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4'], 
    'alias': ['www.example.com']
   },
 112760496444: 
   {'dst': ['10.121.4.136'], 
    'src': ['1.2.3.4']
   },
 112760496502: 
   {'dst': ['10.122.195.34'], 
    'src': ['4.3.2.1']
   }
}

filter = list(set(list(input_raw.keys())))

fixedlist = {}

for i in filter:
    fixedlist[i] = logins[i]

You can use您可以使用

set(dictionary) 

to solve your problem.解决您的问题。

example = {
    'id1':  {'name': 'jay','age':22,},
    'id2': {'name': 'salman','age': 52,},
    'id3': {'name':'Ranveer','age' :26,},
    'id4': {'name': 'jay', 'age': 22,},
}
for item in example:
    for value in example:
        if example[item] ==example[value]:
            if item != value:
                 key = value 
                 del example[key]
print "example",example         

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM