如何從具有多個對象的python字典中獲取重復對象和相應鍵的總數？

Question

我有一個python字典，它由許多嵌套字典組成。 即它看起來像這樣：

result = {
    123: {
       'route1': 'abc'
       'route2': 'abc1'
        },
    456: {
       'route1': 'abc'
       'route2': 'abc1'
        },
    789: {
       'route1': 'abc2'
       'route2': 'abc3'
        },
    101: {
       'route1': 'abc'
       'route2': 'abc1'
        },
    102: {
       'route1': 'ab4'
       'route2': 'abc5'
        }

}

在這里我們可以看到， 123 ， 456和101具有相同的價值觀。 我想要做的是找出重復的對象，在這種情況下是：

{
   'route1': 'abc'
    'route2': 'abc1'
 }

並且其具有即此重復對象的按鍵123 ， 456和101 。 我們應該怎么做？

除了重復的對象信息，我還想知道哪些對象不重復。 即789及其各自的對象和102及其各自的對象。

PS：請注意我事先並不知道哪些對象正在重復，因為這個結構將在代碼中生成。 因此，可能沒有任何重復的對象，或者可能存在多個，即多個。 此外，由於一些限制，我不能使用pandas或numpy等。

Answer 1

使用collections.defaultdict ：

from collections import defaultdict

d = defaultdict(list)
for k, v in result.items():
    d[tuple(v.items())].append(k)

desired = {
   'route1': 'abc',
    'route2': 'abc1'
 }
d[tuple(desired.items())]

輸出：

[456, 123, 101]

對於不重復的項目，請使用列表理解：

[v for v in d.values() if len(v) == 1]

輸出：

[[102], [789]]

Answer 2

您可以使用drop_duplicates()的函數pandas ：

首先在數據框架上轉換你的dict

import pandas as pd `

df = pd.DataFrame(result).T

輸出：

    route1  route2
123 abc     abc1
456 abc     abc1
789 abc2    abc3
101 abc     abc1
102 ab4     abc5

然后使用函數drop_duplicates並轉換為dict

df2 = df1.drop_duplicates(subset=['route1', 'route2']).T.to_dict()

輸出：

{
 123: {
       'route1': 'abc', 
       'route2': 'abc1'
      },
 789: {
       'route1': 'abc2',
       'route2': 'abc3'
      },
 102: {
       'route1': 'ab4', 
       'route2': 'abc5'
      }
}

Answer 3

您可以通過創建一個字典來完成此操作，該字典包含result字典中每個不同值的所有匹配鍵（其中值本身就是dicts）。 這是Python中相當常見的模式，迭代一個容器並將值聚合到一個dict中。 然后，一旦創建了聚合字典，就可以將其拆分為重復值和單個值。

要構建聚合dict，您需要將result中的每個子句用作鍵，並將原始dict中的匹配鍵附加到與該子句相關聯的列表中。 挑戰在於您不能直接將子區域用作字典鍵，因為它們不可清除。 但是你可以通過將它們轉換為元組來解決這個問題。 還應對元組進行排序，以避免丟失重復序列，這些重復序列會以不同的順序彈出。

通過查看一些示例代碼可能更容易理解：

result = {
    123: {'route1': 'abc', 'route2': 'abc1'},
    456: {'route1': 'abc', 'route2': 'abc1'},
    789: {'route1': 'abc2', 'route2': 'abc3'},
    101: {'route1': 'abc', 'route2': 'abc1'},
    102: {'route1': 'ab4', 'route2': 'abc5'}
}

# make a dict showing all the keys that match each subdict
cross_refs = dict()
for key, subdict in result.items():
    # make hashable version of subdict (can't use dict as lookup key)
    subdict_tuple = tuple(sorted(subdict.items()))
    # create an empty list of keys that match this val
    # (if needed), or retrieve existing list
    matching_keys = cross_refs.setdefault(subdict_tuple, [])
    # add this item to the list
    matching_keys.append(key)

# make lists of duplicates and non-duplicates
dups = {}
singles = {}
for subdict_tuple, keys in cross_refs.items():
    # convert hashed value back to a dict
    subdict = dict(subdict_tuple)
    if len(keys) > 1:
        # convert the list of matching keys to a tuple and use as the key
        dups[tuple(keys)] = subdict
    else:
        # there's only one matching key, so use that as the key
        singles[keys[0]] = subdict

print(dups)
# {
#     (456, 123, 101): {'route2': 'abc1', 'route1': 'abc'}
# }
print(singles)
# {
#     789: {'route2': 'abc3', 'route1': 'abc2'}, 
#     102: {'route2': 'abc5', 'route1': 'ab4'}
# }

如何從具有多個對象的python字典中獲取重復對象和相應鍵的總數？

問題描述

3 個解決方案

解決方案1
1 2019-08-02 07:03:33

解決方案2
1 2019-08-02 07:20:09

解決方案3
1 已采納 2019-08-02 07:49:50

如何從具有多個對象的python字典中獲取重復對象和相應鍵的總數？

問題描述

3 個解決方案

解決方案1 1 2019-08-02 07:03:33

解決方案2 1 2019-08-02 07:20:09

解決方案3 1 已采納 2019-08-02 07:49:50

解決方案1
1 2019-08-02 07:03:33

解決方案2
1 2019-08-02 07:20:09

解決方案3
1 已采納 2019-08-02 07:49:50