[英]Efficiently Compacting a List of Tuples to a Dictionary of Lists in Python?
I am interested in finding a more efficient ( code complexity, speed, memory usage, comprehensions, generators ) method of reducing a list of two element tuples, where the first element may be duplicated between the elements, to a dictionary of lists. 我感兴趣的是找到一种更有效的方法( 代码复杂性,速度,内存使用,理解,生成器 ),以减少两个元素元组的列表,其中第一个元素可以在元素之间重复,从而简化为列表字典。
from copy import deepcopy
a = [('a', 'cat'), ('a', 'dog'), ('b', 'pony'), ('c', 'hippo'), ('c','horse'), ('d', 'cow')]
b = {x[0]: list() for x in a}
c = deepcopy(b)
for key, value in b.items():
for item in a:
if key == item[0]:
c[key].append(item[1])
print(a)
print(c)
[('a', 'cat'), ('a', 'dog'), ('b', 'pony'), ('c', 'hippo'), ('c', 'horse'), ('d', 'cow')]
[('a','cat'),('a','dog'),('b','pony'),('c','hippo'),('c','horse') ,('d','cow')]
{'a': ['cat', 'dog'], 'b': ['pony'], 'c': ['hippo', 'horse'], 'd': ['cow']}
{'a':['cat','dog'],'b':['pony'],'c':['hippo','horse'],'d':['cow']}
from collections import defaultdict
from itertools import groupby
from operator import itemgetter
import timeit
timings = dict()
def wrap(func, *args, **kwargs):
def wrapped():
return func(*args, **kwargs)
return wrapped
a = [('a', 'cat'), ('a', 'dog'), ('b', 'pony'), ('c', 'hippo'), ('c','horse'), ('d', 'cow')]
# yatu's solution
def yatu(x):
output = defaultdict(list)
for item in x:
output[item[0]].append(item[1])
return output
# roseman's solution
def roseman(x):
d = defaultdict(list)
for key, value in a:
d[key].append(value)
return d
# prem's solution
def prem(a):
result = {k: [v for _,v in grp] for k,grp in groupby(a, itemgetter(0))}
return result
# timings
yatus_wrapped = wrap(yatu, a)
rosemans_wrapped = wrap(roseman, a)
prems_wrapped = wrap(prem, a)
timings['yatus'] = timeit.timeit(yatus_wrapped, number=100000)
timings['rosemans'] = timeit.timeit(rosemans_wrapped, number=100000)
timings['prems'] = timeit.timeit(prems_wrapped, number=100000)
# output results
print(timings)
{'yatus': 0.171220442, 'rosemans': 0.153767728, 'prems': 0.22808025399999993}
{'yatus':0.171220442,'rosemans':0.153767728,'prems':0.22808025399999993}
Roseman's solution is marginally the fastest, thank you. 罗斯曼的解决方案几乎是最快的,谢谢。
This can be done with a single loop using a defaultdict: 这可以通过使用defaultdict的单个循环来完成:
from collections import defaultdict
d = defaultdict(list)
for key, value in a:
d[key].append(value)
You could use defaultdict
: 您可以使用
defaultdict
:
from collections import defaultdict
a = [('a', 'cat'), ('a', 'dog'), ('b', 'pony'), ('c', 'hippo'), ('c','horse'), ('d', 'cow')]
output = defaultdict(list)
for item in a:
output[item[0]].append(item[1])
This approach will need less space (only a
and output
) and have a better runtime (linear runtime complexity as it's iterating over a
once and adding each element to the output
dictionary - inserts into dictionaries happen in constant time). 这种方法将需要较少的空间(只需要
a
和output
),并且具有更好的运行时(线性运行时复杂性,因为它a
一次迭代并将每个元素添加到output
字典中-插入字典的时间是固定的)。
You can use itertools.groupby
to group the items first and then merge them as you prefer 您可以使用
itertools.groupby
首先将项目分组,然后根据需要合并它们
>>> from itertools import groupby
>>> from operator import itemgetter
>>> {k: [v for _,v in grp] for k,grp in groupby(a, itemgetter(0))}
{'a': ['cat', 'dog'], 'b': ['pony'], 'c': ['hippo', 'horse'], 'd': ['cow']}
Sort the input if it wont always be in sorted order 如果输入不总是按排序顺序排序
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.