简体   繁体   English

熊猫基于日期的分组以返回条件中的行

[英]Pandas date based grouping to return rows on a condition

I have a dataframe that has data like 我有一个数据框,其中包含类似

1. id       date                   remarks
2. 1       12-01-2015 12:00:15     Good
3. 2       12-01-2015 1:00:14      OK
4. 1       13-01-2015 12:00:15     Not Ok
5. 2       14-01-2015 1:00:15      Bad
6. 3       15-01-2015 1:00:15      Good

I need the output in such a way that for each id the highest date and remarks is returned, so for id 2 it would return 14-01-2015 1:00:15 and remark as bad 我需要以这样的方式输出:对于每个id,返回最高日期和备注,因此对于id 2,它将返回14-01-2015 1:00:15并标记为不好

You need sort_values + groupby + GroupBy.last : 您需要sort_values + groupby + GroupBy.last

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df1 = df.sort_values('date').groupby('id', as_index=False).last()
print (df1)
   id                date remarks
0   1 2015-01-13 12:00:15  Not Ok
1   2 2015-01-14 01:00:15     Bad
2   3 2015-01-15 01:00:15    Good

I hope your date column is in dayfirst format if thats so,you need groupby on id with idxmax on date and reuse then from loc lookup. 我希望您的日期列是第一天的格式,如果这样,您需要在id使用groupby并在date上使用idxmax ,然后从loc查找中重用。 If its not in dayfirst format then idxmin() will help 如果不是dayfirst格式,则idxmin()将有所帮助

df.loc[df.groupby('id')['date'].idxmax()]

Output: 输出:

id                date remarks
2   1 2015-01-13 12:00:15  Not Ok
3   2 2015-01-14 01:00:15     Bad
4   3 2015-01-15 01:00:15    Good

If you dont want the index and intend to create a new dataframe with new index then (Thanks @Zero) 如果您不想要索引,并打算用新索引创建一个新的数据框,那么(谢谢@Zero)

df.loc[df.groupby('id')['date'].idxmax()].reset_index(drop=T‌​rue)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM