简体   繁体   English

Python:在键相同的字典列表中组合唯一值?

[英]Python: Combining unique values in list of dicts where keys are the same?

I'm not sure if I am asking the question in the right way, but this is my issue:我不确定我是否以正确的方式提出问题,但这是我的问题:

I have a list of dicts in the following format:我有以下格式的字典列表:

[
{'user': 'joe', 'IndexUsed': 'a'}, 
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'}, 
{'user': 'admin', 'IndexUsed': 'a'}, 
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
...
]

I want my final result to look like this:我希望我的最终结果如下所示:

[
{'user': 'joe', 'IndexUsed': ['a', 'b']}, 
{'user': 'admin', 'IndexUsed': ['a', 'c']}, 
{'user': 'hugo', 'IndexUsed': ['a', 'd']},
]

In essence, combining/deduplicating the unique fields in IndexUsed and reducing them to only one dict per user本质上,组合/去重IndexUsed中的唯一字段并将它们减少到每个user只有一个字典

I have looked into using reducers, dict comprehension, and searched on StackOverflow but I have some trouble finding use cases using strings.我已经研究过使用 reducer、dict 理解,并在 StackOverflow 上进行了搜索,但我在使用字符串查找用例时遇到了一些麻烦。 The majority of examples I have found are using integers to combine them into a final int/float, but here I rather want to combine it into a single final string.我发现的大多数示例都是使用整数将它们组合成最终的 int/float,但在这里我宁愿将它组合成一个最终的字符串。 Could you help me understand how to approach this problem?你能帮我理解如何解决这个问题吗?

from collections import defaultdict


data = [{'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'joe'},
 {'IndexUsed': 'b', 'user': 'joe'},
 {'IndexUsed': 'a', 'user': 'admin'},
 {'IndexUsed': 'c', 'user': 'admin'},
 {'IndexUsed': 'a', 'user': 'hugo'},
 {'IndexUsed': 'd', 'user': 'hugo'}]

indexes_used = defaultdict(set)
for d in data:
    indexes_used[d['user']].add(d['IndexUsed'])

result = []
for k, v in indexes_used.items():
    result.append({'user': k, 'IndexUsed': sorted(list(v))})

print(*result)

Outputs:输出:

{'user': 'joe', 'IndexUsed': ['a', 'b']} {'user': 'admin', 'IndexUsed': ['a', 'c']} {'user': 'hugo', 'IndexUsed': ['a', 'd']}

Note: for the unaware, defaultdict uses the passed function ( set in this case) as a factory to create the new missing key corresponding value.注意:对于不知情的, defaultdict使用传递的 function(在本例中set )作为工厂来创建新的缺失键对应值。 So every single key of indexes_used is set to a set filled with the used indexes.因此, indexes_used的每个键都设置为一个填充了已使用索引的set Using a set also ignores duplicates.使用set也会忽略重复项。 In the end the set is converted to a sorted list, while creating the required key IndexUsed .最后,该set被转换为排序列表,同时创建所需的键IndexUsed

If the dictionaries are guaranteed to be grouped together by name, then you could use itertools.groupby to process each group of dictionaries separately:如果保证字典按名称分组在一起,那么您可以使用itertools.groupby分别处理每组字典:

from itertools import groupby
from operator import itemgetter

data = [
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'b'},
    {'user': 'admin', 'IndexUsed': 'a'},
    {'user': 'admin', 'IndexUsed': 'c'},
    {'user': 'hugo', 'IndexUsed': 'a'},
    {'user': 'hugo', 'IndexUsed': 'd'},
]

merged_data = [{"user": key, "IndexUsed": list({i: None for i in map(itemgetter("IndexUsed"), group)})} for key, group in groupby(data, key=itemgetter("user"))]
for d in merged_data:
    print(d)

Output: Output:

{'user': 'joe', 'IndexUsed': ['a', 'b']}
{'user': 'admin', 'IndexUsed': ['a', 'c']}
{'user': 'hugo', 'IndexUsed': ['a', 'd']}
>>> 

This was just the first thing I came up with, but I don't like it for several reasons.这只是我想出的第一件事,但我不喜欢它有几个原因。 First, like I said, it assumes that the original dictionaries are grouped together by the key user .首先,就像我说的,它假设原始字典由 key user组合在一起。 In addition, long list-comprehensions are not readable and should be avoided.此外,长列表理解是不可读的,应该避免。 The way in which the merged IndexUsed list is generated is by creating a temporary dictionary which maps unique entries to None (ew, gross - a dictionary is used rather than a set, because sets don't preserve insertion order).生成合并的IndexUsed列表的方式是创建一个临时字典,该字典将唯一条目映射到None (ew,总 - 使用字典而不是集合,因为集合不保留插入顺序)。 It also assumes you're using a certain version of Python 3.x+, where dictionaries are guaranteed to preserve insertion order (you could be more explicit by using collections.OrderedDict , but that's one more import).它还假设您使用的是某个版本的 Python 3.x+,其中保证字典保留插入顺序(您可以使用collections.OrderedDict更明确,但这是另一个重要的部分)。 Finally, you shouldn't have to hardcode the "user" and "IndexUsed" key-literals.最后,您不必对"user""IndexUsed"键字面量进行硬编码。 Someone please suggest a better answer.有人请提出一个更好的答案。

One way to approach this requirement without making use of any libs if you are interested:如果您有兴趣,一种在不使用任何库的情况下满足此要求的方法:

arr = [
{'user': 'joe', 'IndexUsed': 'a'}, 
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'}, 
{'user': 'admin', 'IndexUsed': 'a'}, 
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
]

global_dict = {}


            
for d in arr:


     if(False if d["user"] in global_dict else True):

            global_dict[d["user"]] = [d["IndexUsed"]]
     else:
            global_dict[d["user"]].append(d["IndexUsed"])
            global_dict[d["user"]] = list(set(global_dict[d["user"]]))
 

print(global_dict)

# Now we get a dict of dicts with key as user and value as an array of distinct IndexUsed values: 
# {
#  'joe': ['b', 'a'],
#  'admin': ['c', 'a'],
#  'hugo': ['d', 'a']
# }



final_list = []

for k,v in global_dict.items():
    final_list.append({"user":k,"IndexUsed":v})


print(final_list)

#Desired Output
# [
#  {'user': 'joe', 'IndexUsed': ['b', 'a']},
#  {'user': 'admin', 'IndexUsed': ['c', 'a']},
#  {'user': 'hugo', 'IndexUsed': ['d', 'a']}
# ]

However, if you are a fan of short-liners... let me minimize @progmatico's awesome defaultdict approach to just these three lines.但是,如果您是短线的粉丝……让我将@progmatico 的绝妙 defaultdict 方法最小化为这三行。

from collections import defaultdict


indexes_used = defaultdict(set)
[indexes_used[d['user']].add(d['IndexUsed']) for d in data] # for the side effect
print([{'user': k, 'IndexUsed': sorted(list(v))} for k, v in indexes_used.items()])

And it's still readable.它仍然可读。

without any external lib:没有任何外部库:

l = [
    {'user': 'joe', 'IndexUsed': 'a'}, 
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'a'},
    {'user': 'joe', 'IndexUsed': 'b'}, 
    {'user': 'admin', 'IndexUsed': 'a'}, 
    {'user': 'admin', 'IndexUsed': 'c'},
    {'user': 'hugo', 'IndexUsed': 'a'},
    {'user': 'hugo', 'IndexUsed': 'd'}
]

def combinator(l):
    d = {}
        
    for item in l:
        if(d.get(item['user']) == None):
            d[item['user']] = {item['IndexUsed']}
            pass
        d[item['user']].add(item['IndexUsed'])
        
    return [{'user': key, 'IndexUsed': sorted(value)} for key, value in d.items()]


print(combinator(l))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM