[英]Pandas date based grouping to return rows on a condition
I have a dataframe that has data like 我有一个数据框,其中包含类似
1. id date remarks 2. 1 12-01-2015 12:00:15 Good 3. 2 12-01-2015 1:00:14 OK 4. 1 13-01-2015 12:00:15 Not Ok 5. 2 14-01-2015 1:00:15 Bad 6. 3 15-01-2015 1:00:15 Good
I need the output in such a way that for each id the highest date and remarks is returned, so for id 2 it would return 14-01-2015 1:00:15 and remark as bad 我需要以这样的方式输出:对于每个id,返回最高日期和备注,因此对于id 2,它将返回14-01-2015 1:00:15并标记为不好
You need sort_values
+ groupby
+ GroupBy.last
: 您需要sort_values
+ groupby
+ GroupBy.last
:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df1 = df.sort_values('date').groupby('id', as_index=False).last()
print (df1)
id date remarks
0 1 2015-01-13 12:00:15 Not Ok
1 2 2015-01-14 01:00:15 Bad
2 3 2015-01-15 01:00:15 Good
I hope your date column is in dayfirst format if thats so,you need groupby
on id
with idxmax
on date
and reuse then from loc
lookup. 我希望您的日期列是第一天的格式,如果这样,您需要在id
使用groupby
并在date
上使用idxmax
,然后从loc
查找中重用。 If its not in dayfirst
format then idxmin()
will help 如果不是dayfirst
格式,则idxmin()
将有所帮助
df.loc[df.groupby('id')['date'].idxmax()]
Output: 输出:
id date remarks 2 1 2015-01-13 12:00:15 Not Ok 3 2 2015-01-14 01:00:15 Bad 4 3 2015-01-15 01:00:15 Good
If you dont want the index and intend to create a new dataframe with new index then (Thanks @Zero) 如果您不想要索引,并打算用新索引创建一个新的数据框,那么(谢谢@Zero)
df.loc[df.groupby('id')['date'].idxmax()].reset_index(drop=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.