[英]Group a dataframe and select one cell among them based on a condition
我的数据集看起来像这样Tr, Date, Time AV81312,20161014,121000 AV81312,20161014,160221 AV85012,20170422,150858 AV85012,20161108,11137 AV86157,20170426,45747 AV86157,20170426,45744 AV86157,20160813,134312
我只需要从每个具有最新记录的Tr
中选择一个项目,即具有更高日期和时间的最新记录
所需的输出是Tr, Date, Time AV81312,20161014,160221 AV85012,20170422,150858 AV86157,20170426,45747
我的代码是
df2 = read_csv("sample.csv") df2 = df2.values x = [] for i in df2: for j in df2: if i[2] == j[2]: if i[3] >= j[3]: x.append(i)
它没有按预期工作
用 -
df['Date_Time'] = pd.to_datetime(df['Date'].astype(str).str.cat(df['Time'].astype(str)),format='%Y%m%d%H%M%S')
df.loc[df.groupby('Tr')['Date_Time'].idxmax()].drop('Date_Time', axis=1)
输出
Tr Date Time
0 AV81312 20161014 160221
1 AV85012 20170422 150858
2 AV86157 20170426 45747
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.