[英]How to get total number of repeated objects and respective keys from a python dictionary having multiple objects?
I have a python dictionary which consists of many nested dictionaries. 我有一个python字典,它由许多嵌套字典组成。 Ie it looks like this:
即它看起来像这样:
result = {
123: {
'route1': 'abc'
'route2': 'abc1'
},
456: {
'route1': 'abc'
'route2': 'abc1'
},
789: {
'route1': 'abc2'
'route2': 'abc3'
},
101: {
'route1': 'abc'
'route2': 'abc1'
},
102: {
'route1': 'ab4'
'route2': 'abc5'
}
} }
Here we can see that 123
, 456
and 101
has the same values. 在这里我们可以看到,
123
, 456
和101
具有相同的价值观。 What I am trying to do that is to find out the repeated object which in this case is: 我想要做的是找出重复的对象,在这种情况下是:
{
'route1': 'abc'
'route2': 'abc1'
}
and the keys which have this repeated object ie 123
, 456
and 101
. 并且其具有即此重复对象的按键
123
, 456
和101
。 How can we do this? 我们应该怎么做?
Along with repeated objects info, I also want to know which objects does not repeat. 除了重复的对象信息,我还想知道哪些对象不重复。 Ie
789
and its respective object and 102
and its respective object. 即
789
及其各自的对象和102
及其各自的对象。
PS: Please note that I don't really know beforehand which objects are repeating as this structure will be generated inside code. PS:请注意我事先并不知道哪些对象正在重复,因为这个结构将在代码中生成。 So, it's possible that there could not be any repeated object or there could be multiple ie more than one.
因此,可能没有任何重复的对象,或者可能存在多个,即多个。 Also, I can not use
pandas
or numpy
etc. due to some restrictions. 此外,由于一些限制,我不能使用
pandas
或numpy
等。
Use collections.defaultdict
: 使用
collections.defaultdict
:
from collections import defaultdict
d = defaultdict(list)
for k, v in result.items():
d[tuple(v.items())].append(k)
desired = {
'route1': 'abc',
'route2': 'abc1'
}
d[tuple(desired.items())]
Output: 输出:
[456, 123, 101]
For not-repeated items, use list comprehension: 对于不重复的项目,请使用列表理解:
[v for v in d.values() if len(v) == 1]
Output: 输出:
[[102], [789]]
You can use drop_duplicates()
function of pandas
: 您可以使用
drop_duplicates()
的函数pandas
:
Firstly transforme your dict on dataframe 首先在数据框架上转换你的dict
import pandas as pd `
df = pd.DataFrame(result).T
Output : 输出:
route1 route2
123 abc abc1
456 abc abc1
789 abc2 abc3
101 abc abc1
102 ab4 abc5
Then use the function drop_duplicates
and transform to a dict 然后使用函数
drop_duplicates
并转换为dict
df2 = df1.drop_duplicates(subset=['route1', 'route2']).T.to_dict()
Output : 输出:
{
123: {
'route1': 'abc',
'route2': 'abc1'
},
789: {
'route1': 'abc2',
'route2': 'abc3'
},
102: {
'route1': 'ab4',
'route2': 'abc5'
}
}
You can do this by creating a dictionary holding all the matching keys for each distinct value in your result
dict (where the values are themselves dicts). 您可以通过创建一个字典来完成此操作,该字典包含
result
字典中每个不同值的所有匹配键(其中值本身就是dicts)。 This is a fairly common pattern in Python, iterating through one container and aggregating values into a dict. 这是Python中相当常见的模式,迭代一个容器并将值聚合到一个dict中。 Then, once you've created the aggregation dict, you can split it into duplicate and single values.
然后,一旦创建了聚合字典,就可以将其拆分为重复值和单个值。
To build the aggregation dict, you need to use each subdict from result
as a key and append the matching keys from the original dict to a list associated with that subdict. 要构建聚合dict,您需要将
result
中的每个子句用作键,并将原始dict中的匹配键附加到与该子句相关联的列表中。 The challenge is that you can't use the subdicts directly as dictionary keys, because they are not hashable. 挑战在于您不能直接将子区域用作字典键,因为它们不可清除。 But you can solve that by converting them to tuples.
但是你可以通过将它们转换为元组来解决这个问题。 The tuples should also be sorted, to avoid missing duplicates that happen to pop out with different ordering.
还应对元组进行排序,以避免丢失重复序列,这些重复序列会以不同的顺序弹出。
It may be easier to understand just by looking at some example code: 通过查看一些示例代码可能更容易理解:
result = {
123: {'route1': 'abc', 'route2': 'abc1'},
456: {'route1': 'abc', 'route2': 'abc1'},
789: {'route1': 'abc2', 'route2': 'abc3'},
101: {'route1': 'abc', 'route2': 'abc1'},
102: {'route1': 'ab4', 'route2': 'abc5'}
}
# make a dict showing all the keys that match each subdict
cross_refs = dict()
for key, subdict in result.items():
# make hashable version of subdict (can't use dict as lookup key)
subdict_tuple = tuple(sorted(subdict.items()))
# create an empty list of keys that match this val
# (if needed), or retrieve existing list
matching_keys = cross_refs.setdefault(subdict_tuple, [])
# add this item to the list
matching_keys.append(key)
# make lists of duplicates and non-duplicates
dups = {}
singles = {}
for subdict_tuple, keys in cross_refs.items():
# convert hashed value back to a dict
subdict = dict(subdict_tuple)
if len(keys) > 1:
# convert the list of matching keys to a tuple and use as the key
dups[tuple(keys)] = subdict
else:
# there's only one matching key, so use that as the key
singles[keys[0]] = subdict
print(dups)
# {
# (456, 123, 101): {'route2': 'abc1', 'route1': 'abc'}
# }
print(singles)
# {
# 789: {'route2': 'abc3', 'route1': 'abc2'},
# 102: {'route2': 'abc5', 'route1': 'ab4'}
# }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.