简体   繁体   English

总结基于共同键值的词典列表

[英]Summarize a list of dictionaries based on common key values

I have a list of dictionaries like so: 我有这样的词典列表:

dictlist = [{'day': 0, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 1, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 2, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 3, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 4, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 5, 'start': '11:00am', 'end': '1:00pm'}]

I want to summarize days that share the same 'start' and 'end' times. 我想总结共享'start''end'时间的日子。

For example, 例如,

summarylist = [([0,2, 4], '8:00am', '5:00pm'),
               ([1, 3], '10:00am', '7:00pm')
               ([5], '11:00am', '1:00pm')]

I have tried to adapt some other StackOverflow solutions re: sets and intersections to achieve this with no luck. 我试图改编其他一些StackOverflow解决方案,例如re:set和inters,以实现这一点。 I was trying to re-purpose the solution to this question to no avail. 我试图将针对该问题的解决方案重新调整为无效。 Hoping someone can point me in the right direction. 希望有人可以指出正确的方向。

With itertools.groupby : 使用itertools.groupby

In [1]: %paste
dictlist = [{'day': 0, 'start': '8:00am',  'end': '5:00pm'},
            {'day': 1, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 2, 'start': '8:00am',  'end': '5:00pm'},
            {'day': 3, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 4, 'start': '8:00am',  'end': '5:00pm'},
            {'day': 5, 'start': '11:00am', 'end': '1:00pm'}]

## -- End pasted text --

In [2]: from itertools import groupby

In [3]: tuplist = [(d['day'], (d['start'], d['end'])) for d in dictlist]

In [4]: key = lambda x: x[1]

In [5]: summarylist = [(sorted(e[0] for e in g),) + k
   ...:        for k, g in groupby(sorted(tuplist, key=key), key=key)]

In [6]: summarylist
Out[6]:
[([1, 3], '10:00am', '7:00pm'),
 ([5], '11:00am', '1:00pm'),
 ([0, 2, 4], '8:00am', '5:00pm')]

If you don't need the exact format that you provide you could use defaultdict 如果您不需要提供的确切格式,则可以使用defaultdict

dictlist = [{'day': 0, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 1, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 2, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 3, 'start': '10:00am', 'end': '7:00pm'},
            {'day': 4, 'start': '8:00am', 'end': '5:00pm'},
            {'day': 5, 'start': '11:00am', 'end': '1:00pm'}]

from collections import defaultdict

dd = defaultdict(list)

for d in dictlist:
    dd[(d['start'],d['end'])].append(d['day'])

Result: 结果:

>>> dd
defaultdict(<type 'list'>, {('11:00am', '1:00pm'): [5], ('10:00am', '7:00pm'): [1, 3], ('8:00am', '5:00pm'): [0, 2, 4]})

And if format is important to you could do: 如果格式对您很重要,则可以执行以下操作:

>>> my_list = [(v, k[0], k[1]) for k,v in dd.iteritems()]
>>> my_list
[([5], '11:00am', '1:00pm'), ([1, 3], '10:00am', '7:00pm'), ([0, 2, 4], '8:00am', '5:00pm')]
>>> # If you need the output sorted:  
>>> sorted_my_list = sorted(my_list, key = lambda k : len(k[0]), reverse=True)
>>> sorted_my_list
[([0, 2, 4], '8:00am', '5:00pm'), ([1, 3], '10:00am', '7:00pm'), ([5], '11:00am', '1:00pm')]

You can use itertools.groupby like this. 您可以像这样使用itertools.groupby

source code: 源代码:

from itertools import groupby
for k, grp in groupby(sorted(dictlist, key=lambda x:(x['end'], x['start'])), key=lambda x:(x['start'], x['end'])):
    print [i['day'] for i in grp], k

output: 输出:

[5] ('11:00am', '1:00pm')
[0, 2, 4] ('8:00am', '5:00pm')
[1, 3] ('10:00am', '7:00pm')

But I think using defaultdict (@Akavall answer) is the right way in this particular case. 但是我认为在这种特殊情况下,使用defaultdict (@Akavall答案)是正确的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM