熊猫基于日期的分组以返回条件中的行

Question

I have a dataframe that has data like 我有一个数据框，其中包含类似

1. id       date                   remarks
2. 1       12-01-2015 12:00:15     Good
3. 2       12-01-2015 1:00:14      OK
4. 1       13-01-2015 12:00:15     Not Ok
5. 2       14-01-2015 1:00:15      Bad
6. 3       15-01-2015 1:00:15      Good

I need the output in such a way that for each id the highest date and remarks is returned, so for id 2 it would return 14-01-2015 1:00:15 and remark as bad 我需要以这样的方式输出：对于每个id，返回最高日期和备注，因此对于id 2，它将返回14-01-2015 1:00:15并标记为不好

Answer 1

You need sort_values + groupby + GroupBy.last : 您需要sort_values + groupby + GroupBy.last ：

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df1 = df.sort_values('date').groupby('id', as_index=False).last()
print (df1)
   id                date remarks
0   1 2015-01-13 12:00:15  Not Ok
1   2 2015-01-14 01:00:15     Bad
2   3 2015-01-15 01:00:15    Good

Answer 2

I hope your date column is in dayfirst format if thats so,you need groupby on id with idxmax on date and reuse then from loc lookup. 我希望您的日期列是第一天的格式，如果这样，您需要在id使用groupby并在date上使用idxmax ，然后从loc查找中重用。 If its not in dayfirst format then idxmin() will help 如果不是dayfirst格式，则idxmin()将有所帮助

df.loc[df.groupby('id')['date'].idxmax()]

Output: 输出：

id                date remarks
2   1 2015-01-13 12:00:15  Not Ok
3   2 2015-01-14 01:00:15     Bad
4   3 2015-01-15 01:00:15    Good

If you dont want the index and intend to create a new dataframe with new index then (Thanks @Zero) 如果您不想要索引，并打算用新索引创建一个新的数据框，那么（谢谢@Zero）

df.loc[df.groupby('id')['date'].idxmax()].reset_index(drop=T‌rue)

熊猫基于日期的分组以返回条件中的行

问题描述

2 个解决方案

解决方案1
2 2017-09-19 07:31:45

解决方案2
2 2017-09-19 07:35:38

熊猫基于日期的分组以返回条件中的行

问题描述

2 个解决方案

解决方案1 2 2017-09-19 07:31:45

解决方案2 2 2017-09-19 07:35:38

解决方案1
2 2017-09-19 07:31:45

解决方案2
2 2017-09-19 07:35:38