如何从具有多个对象的python字典中获取重复对象和相应键的总数？

Question

I have a python dictionary which consists of many nested dictionaries. 我有一个python字典，它由许多嵌套字典组成。 Ie it looks like this: 即它看起来像这样：

result = {
    123: {
       'route1': 'abc'
       'route2': 'abc1'
        },
    456: {
       'route1': 'abc'
       'route2': 'abc1'
        },
    789: {
       'route1': 'abc2'
       'route2': 'abc3'
        },
    101: {
       'route1': 'abc'
       'route2': 'abc1'
        },
    102: {
       'route1': 'ab4'
       'route2': 'abc5'
        }

} }

Here we can see that 123 , 456 and 101 has the same values. 在这里我们可以看到， 123 ， 456和101具有相同的价值观。 What I am trying to do that is to find out the repeated object which in this case is: 我想要做的是找出重复的对象，在这种情况下是：

{
   'route1': 'abc'
    'route2': 'abc1'
 }

and the keys which have this repeated object ie 123 , 456 and 101 . 并且其具有即此重复对象的按键123 ， 456和101 。 How can we do this? 我们应该怎么做？

Along with repeated objects info, I also want to know which objects does not repeat. 除了重复的对象信息，我还想知道哪些对象不重复。 Ie 789 and its respective object and 102 and its respective object. 即789及其各自的对象和102及其各自的对象。

PS: Please note that I don't really know beforehand which objects are repeating as this structure will be generated inside code. PS：请注意我事先并不知道哪些对象正在重复，因为这个结构将在代码中生成。 So, it's possible that there could not be any repeated object or there could be multiple ie more than one. 因此，可能没有任何重复的对象，或者可能存在多个，即多个。 Also, I can not use pandas or numpy etc. due to some restrictions. 此外，由于一些限制，我不能使用pandas或numpy等。

Answer 1

Use collections.defaultdict : 使用collections.defaultdict ：

from collections import defaultdict

d = defaultdict(list)
for k, v in result.items():
    d[tuple(v.items())].append(k)

desired = {
   'route1': 'abc',
    'route2': 'abc1'
 }
d[tuple(desired.items())]

Output: 输出：

[456, 123, 101]

For not-repeated items, use list comprehension: 对于不重复的项目，请使用列表理解：

[v for v in d.values() if len(v) == 1]

Output: 输出：

[[102], [789]]

Answer 2

You can use drop_duplicates() function of pandas : 您可以使用drop_duplicates()的函数pandas ：

Firstly transforme your dict on dataframe 首先在数据框架上转换你的dict

import pandas as pd `

df = pd.DataFrame(result).T

Output : 输出：

    route1  route2
123 abc     abc1
456 abc     abc1
789 abc2    abc3
101 abc     abc1
102 ab4     abc5

Then use the function drop_duplicates and transform to a dict 然后使用函数drop_duplicates并转换为dict

df2 = df1.drop_duplicates(subset=['route1', 'route2']).T.to_dict()

Output : 输出：

{
 123: {
       'route1': 'abc', 
       'route2': 'abc1'
      },
 789: {
       'route1': 'abc2',
       'route2': 'abc3'
      },
 102: {
       'route1': 'ab4', 
       'route2': 'abc5'
      }
}

Answer 3

You can do this by creating a dictionary holding all the matching keys for each distinct value in your result dict (where the values are themselves dicts). 您可以通过创建一个字典来完成此操作，该字典包含result字典中每个不同值的所有匹配键（其中值本身就是dicts）。 This is a fairly common pattern in Python, iterating through one container and aggregating values into a dict. 这是Python中相当常见的模式，迭代一个容器并将值聚合到一个dict中。 Then, once you've created the aggregation dict, you can split it into duplicate and single values. 然后，一旦创建了聚合字典，就可以将其拆分为重复值和单个值。

To build the aggregation dict, you need to use each subdict from result as a key and append the matching keys from the original dict to a list associated with that subdict. 要构建聚合dict，您需要将result中的每个子句用作键，并将原始dict中的匹配键附加到与该子句相关联的列表中。 The challenge is that you can't use the subdicts directly as dictionary keys, because they are not hashable. 挑战在于您不能直接将子区域用作字典键，因为它们不可清除。 But you can solve that by converting them to tuples. 但是你可以通过将它们转换为元组来解决这个问题。 The tuples should also be sorted, to avoid missing duplicates that happen to pop out with different ordering. 还应对元组进行排序，以避免丢失重复序列，这些重复序列会以不同的顺序弹出。

It may be easier to understand just by looking at some example code: 通过查看一些示例代码可能更容易理解：

result = {
    123: {'route1': 'abc', 'route2': 'abc1'},
    456: {'route1': 'abc', 'route2': 'abc1'},
    789: {'route1': 'abc2', 'route2': 'abc3'},
    101: {'route1': 'abc', 'route2': 'abc1'},
    102: {'route1': 'ab4', 'route2': 'abc5'}
}

# make a dict showing all the keys that match each subdict
cross_refs = dict()
for key, subdict in result.items():
    # make hashable version of subdict (can't use dict as lookup key)
    subdict_tuple = tuple(sorted(subdict.items()))
    # create an empty list of keys that match this val
    # (if needed), or retrieve existing list
    matching_keys = cross_refs.setdefault(subdict_tuple, [])
    # add this item to the list
    matching_keys.append(key)

# make lists of duplicates and non-duplicates
dups = {}
singles = {}
for subdict_tuple, keys in cross_refs.items():
    # convert hashed value back to a dict
    subdict = dict(subdict_tuple)
    if len(keys) > 1:
        # convert the list of matching keys to a tuple and use as the key
        dups[tuple(keys)] = subdict
    else:
        # there's only one matching key, so use that as the key
        singles[keys[0]] = subdict

print(dups)
# {
#     (456, 123, 101): {'route2': 'abc1', 'route1': 'abc'}
# }
print(singles)
# {
#     789: {'route2': 'abc3', 'route1': 'abc2'}, 
#     102: {'route2': 'abc5', 'route1': 'ab4'}
# }

如何从具有多个对象的python字典中获取重复对象和相应键的总数？

问题描述

3 个解决方案

解决方案1
1 2019-08-02 07:03:33

解决方案2
1 2019-08-02 07:20:09

解决方案3
1 已采纳 2019-08-02 07:49:50

如何从具有多个对象的python字典中获取重复对象和相应键的总数？

问题描述

3 个解决方案

解决方案1 1 2019-08-02 07:03:33

解决方案2 1 2019-08-02 07:20:09

解决方案3 1 已采纳 2019-08-02 07:49:50

解决方案1
1 2019-08-02 07:03:33

解决方案2
1 2019-08-02 07:20:09

解决方案3
1 已采纳 2019-08-02 07:49:50