简体   繁体   English

将数据框分组并根据条件在其中选择一个单元格

[英]Group a dataframe and select one cell among them based on a condition

My dataset look like this Tr, Date, Time AV81312,20161014,121000 AV81312,20161014,160221 AV85012,20170422,150858 AV85012,20161108,11137 AV86157,20170426,45747 AV86157,20170426,45744 AV86157,20160813,134312我的数据集看起来像这样Tr, Date, Time AV81312,20161014,121000 AV81312,20161014,160221 AV85012,20170422,150858 AV85012,20161108,11137 AV86157,20170426,45747 AV86157,20170426,45744 AV86157,20160813,134312

I need to select only one item from each Tr having latest record ie latest record having date and and time higher我只需要从每个具有最新记录的Tr中选择一个项目,即具有更高日期和时间的最新记录

Required output is Tr, Date, Time AV81312,20161014,160221 AV85012,20170422,150858 AV86157,20170426,45747所需的输出是Tr, Date, Time AV81312,20161014,160221 AV85012,20170422,150858 AV86157,20170426,45747

My code is我的代码是

df2 = read_csv("sample.csv") df2 = df2.values x = [] for i in df2: for j in df2: if i[2] == j[2]: if i[3] >= j[3]: x.append(i) It wasn't working as expected df2 = read_csv("sample.csv") df2 = df2.values x = [] for i in df2: for j in df2: if i[2] == j[2]: if i[3] >= j[3]: x.append(i)它没有按预期工作

Use -用 -

df['Date_Time'] = pd.to_datetime(df['Date'].astype(str).str.cat(df['Time'].astype(str)),format='%Y%m%d%H%M%S')
df.loc[df.groupby('Tr')['Date_Time'].idxmax()].drop('Date_Time', axis=1)

Output输出

        Tr      Date    Time
0  AV81312  20161014  160221
1  AV85012  20170422  150858
2  AV86157  20170426   45747

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM