简体   繁体   English

Python:将字典中的列表项分组

[英]Python: group list items in a dict

I want to generate a dictionary from a list of dictionaries, grouping list items by the value of some key, such as: 我想从字典列表中生成字典,然后按某些键的值将列表项分组,例如:

input_list = [
        {'a':'tata', 'b': 'foo'},
        {'a':'pipo', 'b': 'titi'},
        {'a':'pipo', 'b': 'toto'},
        {'a':'tata', 'b': 'bar'}
]
output_dict = {
        'pipo': [
             {'a': 'pipo', 'b': 'titi'}, 
             {'a': 'pipo', 'b': 'toto'}
         ],
         'tata': [
             {'a': 'tata', 'b': 'foo'},
             {'a': 'tata', 'b': 'bar'}
         ]
}

So far I've found two ways of doing this. 到目前为止,我已经找到了两种方法。 The first simply iterates over the list, create sublists in the dict for each key value and append elements matching these keys to the sublist : 第一个简单地遍历列表,在字典中为每个键值创建子列表,并将与这些键匹配的元素追加到子列表中:

l = [ 
    {'a':'tata', 'b': 'foo'},
    {'a':'pipo', 'b': 'titi'},
    {'a':'pipo', 'b': 'toto'},
    {'a':'tata', 'b': 'bar'}
    ]

res = {}

for e in l:
    res[e['a']] = res.get(e['a'], []) 
    res[e['a']].append(e)

And another using itertools.groupby : 另一个使用itertools.groupby

import itertools
from operator import itemgetter

l = [ 
        {'a':'tata', 'b': 'foo'},
        {'a':'pipo', 'b': 'titi'},
        {'a':'pipo', 'b': 'toto'},
        {'a':'tata', 'b': 'bar'}
]

l = sorted(l, key=itemgetter('a'))
res = dict((k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a')))

I wonder which alternative is the most efficient ? 我想知道哪种选择最有效?

Is there any more pythonic/concise or better performing way of achieving this ? 有没有更多的pythonic / concise或更好的方法来实现这一目标?

Is it correct that you want to group your input list by the value of the 'a' key of the list elements? 您想通过列表元素的'a'键的值对输入列表进行分组是否正确? If so, your first approach is the best, one minor improvement, use dict.setdefault : 如果是这样,您的第一种方法是最好的,是一个小的改进,请使用dict.setdefault

res = {}
for item in l:
    res.setdefault(item['a'], []).append(item)

A one liner - 一班轮-

>>> import itertools
>>> input_list = [
...         {'a':'tata', 'b': 'foo'},
...         {'a':'pipo', 'b': 'titi'},
...         {'a':'pipo', 'b': 'toto'},
...         {'a':'tata', 'b': 'bar'}
... ]
>>> {k:[v for v in input_list if v['a'] == k] for k, val in itertools.groupby(input_list,lambda x: x['a'])}
{'tata': [{'a': 'tata', 'b': 'foo'}, {'a': 'tata', 'b': 'bar'}], 'pipo': [{'a': 'pipo', 'b': 'titi'}, {'a': 'pipo', 'b': 'toto'}]}

If by efficient you mean "time efficient" , it is possible to measure it using the timeit built in module. 如果有效你的意思是“时间效率”,也可以使用来衡量它timeit内置模块。

For example: 例如:

import timeit
import itertools
from operator import itemgetter

input = [{'a': 'tata', 'b': 'foo'},
         {'a': 'pipo', 'b': 'titi'},
         {'a': 'pipo', 'b': 'toto'},
         {'a': 'tata', 'b': 'bar'}]

def solution1():
    res = {}
    for e in input:
        res[e['a']] = res.get(e['a'], [])
        res[e['a']].append(e)
    return res

def solution2():
    l = sorted(input, key=itemgetter('a'))
    res = dict(
        (k, list(g)) for k, g in itertools.groupby(l, key=itemgetter('a'))
    )
    return res

t = timeit.Timer(solution1)
print(t.timeit(10000))
# 0.0122511386871

t = timeit.Timer(solution2)
print(t.timeit(10000))
# 0.0366218090057

Please refer to the timeit official docs for further information. 请参考timeit官方文档以获取更多信息。

The best approach is the first one you mentioned, and you can even make it more elegant by using setdefault as mentioned by bernhard above. 最好的方法是您提到的第一个方法,您甚至可以通过使用以上bernhard提到的setdefault使其更优雅。 The complexity of this approach is O(n) since we simply iterate over the input once and for each item we perform a lookup into the output dict we are building to find the appropriate list to append it to, which takes constant time (lookup+append) for each item. 这种方法的复杂度为O(n),因为我们只需对输入进行一次迭代,然后对每一项进行查找,就可以对正在构建的输出字典进行查找,以找到要附加到其上的适当列表,这需要花费固定时间(lookup +附加)。 So overlal complexity is O(n) which is optimal. 因此总体复杂度为O(n),这是最优的。

When using itertools.groupby, you must sort the input beforehand (which is O(n log n)). 使用itertools.groupby时,必须预先对输入进行排序(为O(n log n))。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM