简体   繁体   English

获取给定元组键的字典中的最大值,其中元组的元素可能位于不同的位置

[英]Get maximum value in dictionary for a given tuple key where elements of the tuple may be in different positions

I have the following dictionary:我有以下字典:

{('2019-07-243541760601284691', '2019-07-243541760603812661'): 1086,
('2019-07-243541760601314711', '2019-07-243541760603996721'): 662,
('2019-07-243541760603794841', '2019-07-243541760600899921'): 483,
('2019-07-243541760603794841', '2019-07-243541760601224211'): 70,
('2019-07-243541760603794841', '2019-07-243541760607368321'): 54,
('2019-07-243541760600899921', '2019-07-243541760601224211'): 93,
('2019-07-243541760600899921', '2019-07-243541760607368321'): 74,
('2019-07-243541760601224211', '2019-07-243541760607368321'): 490,
('2019-07-243541760613553761', '2019-07-243541760602348611'): 484,
('2019-07-243541760602450401', '2019-07-243541760602927941'): 1118,
('2019-07-243541760603292161', '2019-07-243541760606108621'): 732}

The keys are tuples of unique ids.键是唯一 ID 的元组。 As you can see certain ids are repeated within both columns (eg. lines 3 to 5 have the same first id, and element 2 from line 3 is also present in lines 6 and 7).如您所见,某些 id 在两列中重复(例如,第 3 行到第 5 行具有相同的第一个 id,第 3 行中的元素 2 也出现在第 6 行和第 7 行中)。 I want to find a way to return the maximum value for each given unique id.我想找到一种方法来为每个给定的唯一 id 返回最大值。

If it helps the output I am looking for would look like this:如果它有助于我正在寻找的输出看起来像这样:

{('2019-07-243541760601284691', '2019-07-243541760603812661'): 1086,
('2019-07-243541760601314711', '2019-07-243541760603996721'): 662,
('2019-07-243541760603794841', '2019-07-243541760600899921'): 483,
('2019-07-243541760601224211', '2019-07-243541760607368321'): 490,
('2019-07-243541760613553761', '2019-07-243541760602348611'): 484,
('2019-07-243541760602450401', '2019-07-243541760602927941'): 1118,
('2019-07-243541760603292161', '2019-07-243541760606108621'): 732}

In other words each unique id should appear at most in one key in the dictionary.换句话说,每个唯一的 id最多应该出现字典中的一个键中。

You can change the behavior of the built in types in Python.您可以更改 Python 中内置类型的行为。 For your case it's really easy to create a dict subclass that will store duplicated values in lists under the same key automatically.对于您的情况,创建一个 dict 子类非常容易,该子类将自动在同一键下的列表中存储重复值。 You can then get the max value of the list.然后您可以获得列表的最大值。

class Dictlist(dict):
    def __setitem__(self, key, value):
        try:
            self[key]
        except KeyError:
            super(Dictlist, self).__setitem__(key, [])
        self[key].append(value)

Output example:输出示例:

>>> d = dictlist.Dictlist()
>>> d['test'] = 1
>>> d['test'] = 2
>>> d['test'] = 3
>>> d
{'test': [1, 2, 3]}
>>> d['other'] = 100
>>> d
{'test': [1, 2, 3], 'other': [100]}

So when you want to return the max value of a key, just pass the key as the parameters and this will give you the max value in that list.因此,当您想返回某个键的最大值时,只需将该键作为参数传递,这将为您提供该列表中的最大值。

max(dict[key_value])

First I sort by values in a list L .首先,我按列表Lsort Then I iterate over L , and I store each new tuple's id in a set ids_set , and if they were not in this set yet, then I put them into my dict final .然后我遍历L ,并将每个ids_set组的 id 存储在一个集合ids_set ,如果它们还不在这个集合中,那么我将它们放入我的 dict final

d = {('2019-07-243541760601284691', '2019-07-243541760603812661'): 1086,
('2019-07-243541760601314711', '2019-07-243541760603996721'): 662,
('2019-07-243541760603794841', '2019-07-243541760600899921'): 483,
('2019-07-243541760603794841', '2019-07-243541760601224211'): 70,
('2019-07-243541760603794841', '2019-07-243541760607368321'): 54,
('2019-07-243541760600899921', '2019-07-243541760601224211'): 93,
('2019-07-243541760600899921', '2019-07-243541760607368321'): 74,
('2019-07-243541760601224211', '2019-07-243541760607368321'): 490,
('2019-07-243541760613553761', '2019-07-243541760602348611'): 484,
('2019-07-243541760602450401', '2019-07-243541760602927941'): 1118,
('2019-07-243541760603292161', '2019-07-243541760606108621'): 732}

L = sorted(d.items(), key=lambda item: item[1], reverse=True)
# L = [(('2019-07-243541760602450401', '2019-07-243541760602927941'), 1118),...

final = {}
ids_set = set()
for i in L:
    t = i[0] # tuple
    val = i[1]
    if (t[0] or t[1]) not in ids_set:
        ids_set.update(t)
        final[t] = val

for i, v in final.items():
    print(i, v)

Output :输出 :

('2019-07-243541760602450401', '2019-07-243541760602927941') 1118
('2019-07-243541760601284691', '2019-07-243541760603812661') 1086
('2019-07-243541760603292161', '2019-07-243541760606108621') 732
('2019-07-243541760601314711', '2019-07-243541760603996721') 662
('2019-07-243541760601224211', '2019-07-243541760607368321') 490
('2019-07-243541760613553761', '2019-07-243541760602348611') 484
('2019-07-243541760603794841', '2019-07-243541760600899921') 483
d = {('2019-07-243541760601284691', '2019-07-243541760603812661'): 1086,
('2019-07-243541760601314711', '2019-07-243541760603996721'): 662,
('2019-07-243541760603794841', '2019-07-243541760600899921'): 483,
('2019-07-243541760603794841', '2019-07-243541760601224211'): 70,
('2019-07-243541760603794841', '2019-07-243541760607368321'): 54,
('2019-07-243541760600899921', '2019-07-243541760601224211'): 93,
('2019-07-243541760600899921', '2019-07-243541760607368321'): 74,
('2019-07-243541760601224211', '2019-07-243541760607368321'): 490,
('2019-07-243541760613553761', '2019-07-243541760602348611'): 484,
('2019-07-243541760602450401', '2019-07-243541760602927941'): 1118,
('2019-07-243541760603292161', '2019-07-243541760606108621'): 732}

# flatten the dict so that each id in the tuple has value
# but first make sure we sort on values so that if the key appears twice
# the latest element should be the one with max value (dict only keeps one item per key)
sorted_dict = dict(item for item in sorted(d.items(), key=lambda x: x[-1]))
flat_dict = {**{k[0]:v for k,v in d.items()}, **{k[1]:v for k,v in sorted_dict .items()}}

# get the unique keys you want
keys, already_used = [], []
for k in d:
    if k[0] in already_used or k[1] in already_used:
        continue
    keys.append(k)
    already_used.extend(k)

# now create the new_dict
new_dict = {k:max(flat_dict[k[0]], flat_dict[k[1]]) for k in keys}

Output输出

>>> new_dict
{('2019-07-243541760601284691', '2019-07-243541760603812661'): 1086,
 ('2019-07-243541760601314711', '2019-07-243541760603996721'): 662,
 ('2019-07-243541760603794841', '2019-07-243541760600899921'): 483,
 ('2019-07-243541760601224211', '2019-07-243541760607368321'): 490,
 ('2019-07-243541760613553761', '2019-07-243541760602348611'): 484,
 ('2019-07-243541760602450401', '2019-07-243541760602927941'): 1118,
 ('2019-07-243541760603292161', '2019-07-243541760606108621'): 732}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM