简体   繁体   English

按最大日期列出分组

[英]List group by on max date

Suppose I have a list of lists like so:假设我有一个这样的列表列表:

[   
    ['group1', 'type1', '2021-3-24'],
    ['group1', 'type1', '2021-3-25'],
    ['group1', 'type1', '2021-3-26'],
    ['group2', 'type2', '2022-5-21'],
    ['group2', 'type2', '2021-1-12'],
    ['group2', 'type2', '2021-3-26'],
]

and I want these results:我想要这些结果:

[   
    ['group1', 'type1', '2021-3-26'],
    ['group2', 'type2', '2022-5-21'],
]

where each list in the parent list is grouped by group and type and the function performed is a "max date" operation.其中父列表中的每个列表按grouptype分组,执行的 function 是“最大日期”操作。

The SQL statement equivalent of what I'm looking for: SQL 声明相当于我正在寻找的内容:

select
    group,
    type,
    max(date)
from my_list
group by
    group,
    type

I would like to avoid the overhead of Pandas as I think this can be done using itertools.groupby and my datasets are relatively small, but I just can't find a close enough example to understand how this would work.我想避免 Pandas 的开销,因为我认为这可以使用itertools.groupby来完成,而且我的数据集相对较小,但我找不到足够接近的示例来理解它是如何工作的。

You can use collections.defaultdict :您可以使用collections.defaultdict

import collections, datetime
d = collections.defaultdict(list)
data = [['group1', 'type1', '2021-3-24'], ['group1', 'type1', '2021-3-25'], ['group1', 'type1', '2021-3-26'], ['group2', 'type2', '2022-5-21'], ['group2', 'type2', '2021-1-12'], ['group2', 'type2', '2021-3-26']]
for a, b, c in data:
  d[(a, b)].append(datetime.date(*map(int, c.split('-'))))

result = [[*a, str(max(b))] for a, b in d.items()]

Output: Output:

[['group1', 'type1', '2021-03-26'], ['group2', 'type2', '2022-05-21']]

@Ajax answer is good, but for completeness I add version with groupby : @Ajax 答案很好,但为了完整起见,我添加了groupby版本:

lst = [
    ["group1", "type1", "2021-3-24"],
    ["group1", "type1", "2021-3-25"],
    ["group1", "type1", "2021-3-26"],
    ["group2", "type2", "2022-5-21"],
    ["group2", "type2", "2021-1-12"],
    ["group2", "type2", "2021-3-26"],
]

from itertools import groupby

out = []
# if list is not sorted:
# lst = sorted(lst, key=lambda k: (k[0], k[1]))
for c, g in groupby(lst, lambda k: (k[0], k[1])):
    out.append(
        [*c, "-".join(map(str, max([*map(int, v[-1].split("-"))] for v in g)))]
    )

print(out)

Prints:印刷:

[['group1', 'type1', '2021-3-26'], ['group2', 'type2', '2022-5-21']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM