My dataset look like this Tr, Date, Time AV81312,20161014,121000 AV81312,20161014,160221 AV85012,20170422,150858 AV85012,20161108,11137 AV86157,20170426,45747 AV86157,20170426,45744 AV86157,20160813,134312
I need to select only one item from each Tr
having latest record ie latest record having date and and time higher
Required output is Tr, Date, Time AV81312,20161014,160221 AV85012,20170422,150858 AV86157,20170426,45747
My code is
df2 = read_csv("sample.csv") df2 = df2.values x = [] for i in df2: for j in df2: if i[2] == j[2]: if i[3] >= j[3]: x.append(i)
It wasn't working as expected
Use -
df['Date_Time'] = pd.to_datetime(df['Date'].astype(str).str.cat(df['Time'].astype(str)),format='%Y%m%d%H%M%S')
df.loc[df.groupby('Tr')['Date_Time'].idxmax()].drop('Date_Time', axis=1)
Output
Tr Date Time
0 AV81312 20161014 160221
1 AV85012 20170422 150858
2 AV86157 20170426 45747
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.