简体   繁体   中英

How to get total number of repeated objects and respective keys from a python dictionary having multiple objects?

I have a python dictionary which consists of many nested dictionaries. Ie it looks like this:

result = {
    123: {
       'route1': 'abc'
       'route2': 'abc1'
        },
    456: {
       'route1': 'abc'
       'route2': 'abc1'
        },
    789: {
       'route1': 'abc2'
       'route2': 'abc3'
        },
    101: {
       'route1': 'abc'
       'route2': 'abc1'
        },
    102: {
       'route1': 'ab4'
       'route2': 'abc5'
        }

}

Here we can see that 123 , 456 and 101 has the same values. What I am trying to do that is to find out the repeated object which in this case is:

{
   'route1': 'abc'
    'route2': 'abc1'
 }

and the keys which have this repeated object ie 123 , 456 and 101 . How can we do this?

Along with repeated objects info, I also want to know which objects does not repeat. Ie 789 and its respective object and 102 and its respective object.

PS: Please note that I don't really know beforehand which objects are repeating as this structure will be generated inside code. So, it's possible that there could not be any repeated object or there could be multiple ie more than one. Also, I can not use pandas or numpy etc. due to some restrictions.

Use collections.defaultdict :

from collections import defaultdict

d = defaultdict(list)
for k, v in result.items():
    d[tuple(v.items())].append(k)

desired = {
   'route1': 'abc',
    'route2': 'abc1'
 }
d[tuple(desired.items())]

Output:

[456, 123, 101]

For not-repeated items, use list comprehension:

[v for v in d.values() if len(v) == 1]

Output:

[[102], [789]]

You can use drop_duplicates() function of pandas :

Firstly transforme your dict on dataframe

import pandas as pd `

df = pd.DataFrame(result).T

Output :

    route1  route2
123 abc     abc1
456 abc     abc1
789 abc2    abc3
101 abc     abc1
102 ab4     abc5

Then use the function drop_duplicates and transform to a dict

df2 = df1.drop_duplicates(subset=['route1', 'route2']).T.to_dict()

Output :

{
 123: {
       'route1': 'abc', 
       'route2': 'abc1'
      },
 789: {
       'route1': 'abc2',
       'route2': 'abc3'
      },
 102: {
       'route1': 'ab4', 
       'route2': 'abc5'
      }
}

You can do this by creating a dictionary holding all the matching keys for each distinct value in your result dict (where the values are themselves dicts). This is a fairly common pattern in Python, iterating through one container and aggregating values into a dict. Then, once you've created the aggregation dict, you can split it into duplicate and single values.

To build the aggregation dict, you need to use each subdict from result as a key and append the matching keys from the original dict to a list associated with that subdict. The challenge is that you can't use the subdicts directly as dictionary keys, because they are not hashable. But you can solve that by converting them to tuples. The tuples should also be sorted, to avoid missing duplicates that happen to pop out with different ordering.

It may be easier to understand just by looking at some example code:

result = {
    123: {'route1': 'abc', 'route2': 'abc1'},
    456: {'route1': 'abc', 'route2': 'abc1'},
    789: {'route1': 'abc2', 'route2': 'abc3'},
    101: {'route1': 'abc', 'route2': 'abc1'},
    102: {'route1': 'ab4', 'route2': 'abc5'}
}

# make a dict showing all the keys that match each subdict
cross_refs = dict()
for key, subdict in result.items():
    # make hashable version of subdict (can't use dict as lookup key)
    subdict_tuple = tuple(sorted(subdict.items()))
    # create an empty list of keys that match this val
    # (if needed), or retrieve existing list
    matching_keys = cross_refs.setdefault(subdict_tuple, [])
    # add this item to the list
    matching_keys.append(key)

# make lists of duplicates and non-duplicates
dups = {}
singles = {}
for subdict_tuple, keys in cross_refs.items():
    # convert hashed value back to a dict
    subdict = dict(subdict_tuple)
    if len(keys) > 1:
        # convert the list of matching keys to a tuple and use as the key
        dups[tuple(keys)] = subdict
    else:
        # there's only one matching key, so use that as the key
        singles[keys[0]] = subdict

print(dups)
# {
#     (456, 123, 101): {'route2': 'abc1', 'route1': 'abc'}
# }
print(singles)
# {
#     789: {'route2': 'abc3', 'route1': 'abc2'}, 
#     102: {'route2': 'abc5', 'route1': 'ab4'}
# }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM