[英]Keep the most recent values and drop older rows (pandas)
I have a dataframe table below which contains new and old values. 我下面有一个数据框表,其中包含新值和旧值。 I would like to drop all the old values while keeping the new values. 我想删除所有旧值,同时保留新值。
ID Name Time Comment
0 Foo 12:17:37 Rand
1 Foo 12:17:37 Rand1
2 Foo 08:20:00 Rand2
3 Foo 08:20:00 Rand3
4 Bar 09:01:00 Rand4
5 Bar 09:01:00 Rand5
6 Bar 08:50:50 Rand6
7 Bar 08:50:00 Rand7
As such it should look like this: 因此,它应如下所示:
ID Name Time Comment
0 Foo 12:17:37 Rand
1 Foo 12:17:37 Rand1
4 Bar 09:01:00 Rand4
5 Bar 09:01:00 Rand5
I tried to use the code below but this removes 1 new and 1 old value. 我尝试使用下面的代码,但这删除了1个新值和1个旧值。
df[~df[['Time', 'Comment']].duplicated(keep='first')]
Can anyone provide a correct solution? 谁能提供正确的解决方案?
I think you can use this solution with to_timedelta
, if need filter by max value of column Time
: 我认为您可以将此解决方案与to_timedelta
,如果需要按Time
列的最大值进行过滤:
df.Time = pd.to_timedelta(df.Time)
df = df[df.Time == df.Time.max()]
print (df)
ID Name Time Comment
0 0 Foo 12:17:37 Rand
1 1 Foo 12:17:37 Rand1
EDITed solution is similar, only added groupby
: 编辑的解决方案类似,只添加了groupby
:
df = df.groupby('Name', sort=False)
.apply(lambda x: x[x.Time == x.Time.max()])
.reset_index(drop=True)
print (df)
ID Name Time Comment
0 0 Foo 12:17:37 Rand
1 1 Foo 12:17:37 Rand1
2 4 Bar 09:01:00 Rand4
3 5 Bar 09:01:00 Rand5
You can merge group's maximums back to original DF: 您可以将组的最大值合并回原始DF:
df['Time'] = pd.to_timedelta(df['Time'])
In [35]: pd.merge(df, df.groupby('Name', as_index=False)['Time'].max(), on=['Name','Time'])
Out[35]:
ID Name Time Comment
0 0 Foo 12:17:37 Rand
1 1 Foo 12:17:37 Rand1
2 4 Bar 09:01:00 Rand4
3 5 Bar 09:01:00 Rand5
Explanation: 说明:
In [36]: df.groupby('Name', as_index=False)['Time'].max()
Out[36]:
Name Time
0 Bar 09:01:00
1 Foo 12:17:37
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.