[英]Pandas: aggregate column based on values in a different column
Lets say I start with a dataframe that looks like this: 可以说我从一个看起来像这样的数据帧开始:
Group Val date
0 home first 2017-12-01
1 home second 2017-12-02
2 away first 2018-03-07
3 away second 2018-03-01
Data types are [string, string, datetime]. 数据类型为[字符串,字符串,日期时间]。 I would like to get a dataframe that for each group, shows me the value that was entered most recently:
我想获得一个数据框,该数据框为每个组显示最近输入的值:
Group Most rececnt Val Most recent date
0 home second 12-02-2017
1 away first 03-07-2018
(Data types are [string, string, datetime]) (数据类型为[字符串,字符串,日期时间])
My initial thought is that I should be able to do something like this by grouping by 'group' and then aggregating the dates and vals. 我最初的想法是,我应该能够通过按“组”分组然后汇总日期和有效值来进行类似的操作。 I know I can get the most recent datetime using the 'max' agg function, but I'm stuck on what function to use to get the corresponding val:
我知道我可以使用'max'agg函数获取最新的日期时间,但是我被困在使用哪个函数来获取相应的val上:
df.groupby('Group').agg({'val':lambda x: ____????____
'date':'max'})
Thanks, 谢谢,
First select the indeces of the dataframe whose variable value is maximum 首先选择变量值最大的数据帧的索引
max_indeces = df.groupby(['Group'])['date'].idxmax()
and then select the corresponding rows in the original dataframe, maybe only indicating the actual value you are interested in: 然后在原始数据框中选择相应的行,可能仅指示您感兴趣的实际值:
df.iloc[max_indeces]['Val']
In case I understood you right, you can do this: 如果我理解正确,您可以执行以下操作:
df.iloc[df.groupby('Group').agg({'date': 'idxmax'}).date]
Or as a whole example: 或作为一个整体示例:
import pandas as pd
import numpy as np
np.random.seed(42)
data = [(np.random.choice(['home', 'away'], size=1)[0],
np.random.choice(['first', 'second'], size=1)[0],
pd.Timestamp(np.random.rand()*1.9989e+18)) for i in range(10)]
df = pd.DataFrame.from_records(data)
df.columns = ['Group', 'Val', 'date']
df.iloc[df.groupby('Group').agg({'date': 'idxmax'}).date]
Which selects 哪个选择
Group Val date
5 away first 2031-06-09 06:26:43.486610432
0 home second 2030-03-22 04:07:07.082781440
from 从
Group Val date
0 home second 2030-03-22 04:07:07.082781440
1 home second 2007-12-03 05:07:24.061456384
2 home second 1979-11-18 23:57:26.700035456
3 home first 2024-11-12 08:18:17.789517824
4 away second 2014-11-07 13:17:55.756515328
5 away first 2031-06-09 06:26:43.486610432
6 away second 1983-06-14 13:17:28.334806208
7 away second 1981-08-14 03:21:14.746028864
8 away second 2003-03-29 11:00:31.189680256
9 away first 1988-06-12 16:58:48.341865984
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.