[英]Group a dataframe and select one cell among them based on a condition
My dataset look like this Tr, Date, Time AV81312,20161014,121000 AV81312,20161014,160221 AV85012,20170422,150858 AV85012,20161108,11137 AV86157,20170426,45747 AV86157,20170426,45744 AV86157,20160813,134312
我的数据集看起来像这样Tr, Date, Time AV81312,20161014,121000 AV81312,20161014,160221 AV85012,20170422,150858 AV85012,20161108,11137 AV86157,20170426,45747 AV86157,20170426,45744 AV86157,20160813,134312
I need to select only one item from each Tr
having latest record ie latest record having date and and time higher我只需要从每个具有最新记录的Tr
中选择一个项目,即具有更高日期和时间的最新记录
Required output is Tr, Date, Time AV81312,20161014,160221 AV85012,20170422,150858 AV86157,20170426,45747
所需的输出是Tr, Date, Time AV81312,20161014,160221 AV85012,20170422,150858 AV86157,20170426,45747
My code is我的代码是
df2 = read_csv("sample.csv") df2 = df2.values x = [] for i in df2: for j in df2: if i[2] == j[2]: if i[3] >= j[3]: x.append(i)
It wasn't working as expected df2 = read_csv("sample.csv") df2 = df2.values x = [] for i in df2: for j in df2: if i[2] == j[2]: if i[3] >= j[3]: x.append(i)
它没有按预期工作
Use -用 -
df['Date_Time'] = pd.to_datetime(df['Date'].astype(str).str.cat(df['Time'].astype(str)),format='%Y%m%d%H%M%S')
df.loc[df.groupby('Tr')['Date_Time'].idxmax()].drop('Date_Time', axis=1)
Output输出
Tr Date Time
0 AV81312 20161014 160221
1 AV85012 20170422 150858
2 AV86157 20170426 45747
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.