[英]Python: Combining unique values in list of dicts where keys are the same?
I'm not sure if I am asking the question in the right way, but this is my issue:我不确定我是否以正确的方式提出问题,但这是我的问题:
I have a list of dicts in the following format:我有以下格式的字典列表:
[
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'},
{'user': 'admin', 'IndexUsed': 'a'},
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
...
]
I want my final result to look like this:我希望我的最终结果如下所示:
[
{'user': 'joe', 'IndexUsed': ['a', 'b']},
{'user': 'admin', 'IndexUsed': ['a', 'c']},
{'user': 'hugo', 'IndexUsed': ['a', 'd']},
]
In essence, combining/deduplicating the unique fields in IndexUsed
and reducing them to only one dict per user
本质上,组合/去重IndexUsed
中的唯一字段并将它们减少到每个user
只有一个字典
I have looked into using reducers, dict comprehension, and searched on StackOverflow but I have some trouble finding use cases using strings.我已经研究过使用 reducer、dict 理解,并在 StackOverflow 上进行了搜索,但我在使用字符串查找用例时遇到了一些麻烦。 The majority of examples I have found are using integers to combine them into a final int/float, but here I rather want to combine it into a single final string.我发现的大多数示例都是使用整数将它们组合成最终的 int/float,但在这里我宁愿将它组合成一个最终的字符串。 Could you help me understand how to approach this problem?你能帮我理解如何解决这个问题吗?
from collections import defaultdict
data = [{'IndexUsed': 'a', 'user': 'joe'},
{'IndexUsed': 'a', 'user': 'joe'},
{'IndexUsed': 'a', 'user': 'joe'},
{'IndexUsed': 'b', 'user': 'joe'},
{'IndexUsed': 'a', 'user': 'admin'},
{'IndexUsed': 'c', 'user': 'admin'},
{'IndexUsed': 'a', 'user': 'hugo'},
{'IndexUsed': 'd', 'user': 'hugo'}]
indexes_used = defaultdict(set)
for d in data:
indexes_used[d['user']].add(d['IndexUsed'])
result = []
for k, v in indexes_used.items():
result.append({'user': k, 'IndexUsed': sorted(list(v))})
print(*result)
Outputs:输出:
{'user': 'joe', 'IndexUsed': ['a', 'b']} {'user': 'admin', 'IndexUsed': ['a', 'c']} {'user': 'hugo', 'IndexUsed': ['a', 'd']}
Note: for the unaware, defaultdict
uses the passed function ( set
in this case) as a factory to create the new missing key corresponding value.注意:对于不知情的, defaultdict
使用传递的 function(在本例中set
)作为工厂来创建新的缺失键对应值。 So every single key of indexes_used
is set to a set
filled with the used indexes.因此, indexes_used
的每个键都设置为一个填充了已使用索引的set
。 Using a set
also ignores duplicates.使用set
也会忽略重复项。 In the end the set
is converted to a sorted list, while creating the required key IndexUsed
.最后,该set
被转换为排序列表,同时创建所需的键IndexUsed
。
If the dictionaries are guaranteed to be grouped together by name, then you could use itertools.groupby
to process each group of dictionaries separately:如果保证字典按名称分组在一起,那么您可以使用itertools.groupby
分别处理每组字典:
from itertools import groupby
from operator import itemgetter
data = [
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'},
{'user': 'admin', 'IndexUsed': 'a'},
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
]
merged_data = [{"user": key, "IndexUsed": list({i: None for i in map(itemgetter("IndexUsed"), group)})} for key, group in groupby(data, key=itemgetter("user"))]
for d in merged_data:
print(d)
Output: Output:
{'user': 'joe', 'IndexUsed': ['a', 'b']}
{'user': 'admin', 'IndexUsed': ['a', 'c']}
{'user': 'hugo', 'IndexUsed': ['a', 'd']}
>>>
This was just the first thing I came up with, but I don't like it for several reasons.这只是我想出的第一件事,但我不喜欢它有几个原因。 First, like I said, it assumes that the original dictionaries are grouped together by the key user
.首先,就像我说的,它假设原始字典由 key user
组合在一起。 In addition, long list-comprehensions are not readable and should be avoided.此外,长列表理解是不可读的,应该避免。 The way in which the merged IndexUsed
list is generated is by creating a temporary dictionary which maps unique entries to None
(ew, gross - a dictionary is used rather than a set, because sets don't preserve insertion order).生成合并的IndexUsed
列表的方式是创建一个临时字典,该字典将唯一条目映射到None
(ew,总 - 使用字典而不是集合,因为集合不保留插入顺序)。 It also assumes you're using a certain version of Python 3.x+, where dictionaries are guaranteed to preserve insertion order (you could be more explicit by using collections.OrderedDict
, but that's one more import).它还假设您使用的是某个版本的 Python 3.x+,其中保证字典保留插入顺序(您可以使用collections.OrderedDict
更明确,但这是另一个重要的部分)。 Finally, you shouldn't have to hardcode the "user"
and "IndexUsed"
key-literals.最后,您不必对"user"
和"IndexUsed"
键字面量进行硬编码。 Someone please suggest a better answer.有人请提出一个更好的答案。
One way to approach this requirement without making use of any libs if you are interested:如果您有兴趣,一种在不使用任何库的情况下满足此要求的方法:
arr = [
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'},
{'user': 'admin', 'IndexUsed': 'a'},
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'},
]
global_dict = {}
for d in arr:
if(False if d["user"] in global_dict else True):
global_dict[d["user"]] = [d["IndexUsed"]]
else:
global_dict[d["user"]].append(d["IndexUsed"])
global_dict[d["user"]] = list(set(global_dict[d["user"]]))
print(global_dict)
# Now we get a dict of dicts with key as user and value as an array of distinct IndexUsed values:
# {
# 'joe': ['b', 'a'],
# 'admin': ['c', 'a'],
# 'hugo': ['d', 'a']
# }
final_list = []
for k,v in global_dict.items():
final_list.append({"user":k,"IndexUsed":v})
print(final_list)
#Desired Output
# [
# {'user': 'joe', 'IndexUsed': ['b', 'a']},
# {'user': 'admin', 'IndexUsed': ['c', 'a']},
# {'user': 'hugo', 'IndexUsed': ['d', 'a']}
# ]
However, if you are a fan of short-liners... let me minimize @progmatico's awesome defaultdict approach to just these three lines.但是,如果您是短线的粉丝……让我将@progmatico 的绝妙 defaultdict 方法最小化为这三行。
from collections import defaultdict
indexes_used = defaultdict(set)
[indexes_used[d['user']].add(d['IndexUsed']) for d in data] # for the side effect
print([{'user': k, 'IndexUsed': sorted(list(v))} for k, v in indexes_used.items()])
And it's still readable.它仍然可读。
without any external lib:没有任何外部库:
l = [
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'a'},
{'user': 'joe', 'IndexUsed': 'b'},
{'user': 'admin', 'IndexUsed': 'a'},
{'user': 'admin', 'IndexUsed': 'c'},
{'user': 'hugo', 'IndexUsed': 'a'},
{'user': 'hugo', 'IndexUsed': 'd'}
]
def combinator(l):
d = {}
for item in l:
if(d.get(item['user']) == None):
d[item['user']] = {item['IndexUsed']}
pass
d[item['user']].add(item['IndexUsed'])
return [{'user': key, 'IndexUsed': sorted(value)} for key, value in d.items()]
print(combinator(l))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.