[英]List group by on max date
Suppose I have a list of lists like so:假设我有一个这样的列表列表:
[
['group1', 'type1', '2021-3-24'],
['group1', 'type1', '2021-3-25'],
['group1', 'type1', '2021-3-26'],
['group2', 'type2', '2022-5-21'],
['group2', 'type2', '2021-1-12'],
['group2', 'type2', '2021-3-26'],
]
and I want these results:我想要这些结果:
[
['group1', 'type1', '2021-3-26'],
['group2', 'type2', '2022-5-21'],
]
where each list in the parent list is grouped by group
and type
and the function performed is a "max date" operation.其中父列表中的每个列表按
group
和type
分组,执行的 function 是“最大日期”操作。
The SQL statement equivalent of what I'm looking for: SQL 声明相当于我正在寻找的内容:
select
group,
type,
max(date)
from my_list
group by
group,
type
I would like to avoid the overhead of Pandas as I think this can be done using itertools.groupby
and my datasets are relatively small, but I just can't find a close enough example to understand how this would work.我想避免 Pandas 的开销,因为我认为这可以使用
itertools.groupby
来完成,而且我的数据集相对较小,但我找不到足够接近的示例来理解它是如何工作的。
You can use collections.defaultdict
:您可以使用
collections.defaultdict
:
import collections, datetime
d = collections.defaultdict(list)
data = [['group1', 'type1', '2021-3-24'], ['group1', 'type1', '2021-3-25'], ['group1', 'type1', '2021-3-26'], ['group2', 'type2', '2022-5-21'], ['group2', 'type2', '2021-1-12'], ['group2', 'type2', '2021-3-26']]
for a, b, c in data:
d[(a, b)].append(datetime.date(*map(int, c.split('-'))))
result = [[*a, str(max(b))] for a, b in d.items()]
Output: Output:
[['group1', 'type1', '2021-03-26'], ['group2', 'type2', '2022-05-21']]
@Ajax answer is good, but for completeness I add version with groupby
: @Ajax 答案很好,但为了完整起见,我添加了
groupby
版本:
lst = [
["group1", "type1", "2021-3-24"],
["group1", "type1", "2021-3-25"],
["group1", "type1", "2021-3-26"],
["group2", "type2", "2022-5-21"],
["group2", "type2", "2021-1-12"],
["group2", "type2", "2021-3-26"],
]
from itertools import groupby
out = []
# if list is not sorted:
# lst = sorted(lst, key=lambda k: (k[0], k[1]))
for c, g in groupby(lst, lambda k: (k[0], k[1])):
out.append(
[*c, "-".join(map(str, max([*map(int, v[-1].split("-"))] for v in g)))]
)
print(out)
Prints:印刷:
[['group1', 'type1', '2021-3-26'], ['group2', 'type2', '2022-5-21']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.