熊猫为另一列的每个不同值选择具有某些列最大值的行

Question

I have a dataframe in pandas like this: 我在像这样的大熊猫中有一个数据框：

    id  some_type   some_date   some_data
0   1   A           19/12/1995  X
1   2   A           10/04/1997  Y
2   2   B           05/03/2013  Z
3   2   B           09/05/2017  W
4   2   B           09/05/2017  R
5   3   A           01/07/1998  M
6   3   B           09/08/2009  N

I need for each value of id, the rows that have the max value of some_type and some_date without deleting any value of some_data. 我需要id的每个值，具有some_type和some_date最大值的行而不删除some_data的任何值。

In other words, what I need is the following: 换句话说，我需要以下内容：

    id  some_type   some_date   some_data
0   1   A           19/12/1995  X
3   2   B           09/05/2017  W
4   2   B           09/05/2017  R
6   3   B           09/08/2009  N

Answer 1

you can do it with sort_values , groupby and apply by keeping the rows with the last value some_type and some_date: 你可以做到这一点sort_values ， groupby和apply通过保持与最后的值some_type和some_date行：

df_output = (df.sort_values(by=['some_type','some_date']).groupby('id')
                .apply(lambda df_g: df_g[(df_g['some_type'] == df_g['some_type'].iloc[-1]) & 
                                          (df_g['some_date'] == df_g['some_date'].iloc[-1])])
                  .reset_index(0,drop=True))

and the output is: 输出为：

   id some_type  some_date some_data
0   1         A 1995-12-19         X
3   2         B 2017-09-05         W
4   2         B 2017-09-05         R
6   3         B 2009-09-08         N

EDIT: if you don't care about the indexes, you can also use merge : 编辑：如果您不在乎索引，也可以使用merge ：

#first get the last one once sorting
df_last = df.sort_values(['some_type','some_date']).groupby('id')['some_type','some_date'].last()
# now merge with inner to keep the one you want
df_output  = df.merge(df_last ,how='inner')

you will get the same result besides indexes 除了索引，您将获得相同的结果

Answer 2

Create a mask with groupby and max() and apply. 使用groupby和max()创建一个遮罩并应用。 But first convert to datetime: 但首先转换为日期时间：

df['some_date'] = pd.to_datetime(df['some_date'])
m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)  
df = df[m]

Full example: 完整示例：

import pandas as pd

text = '''\
id  some_type   some_date   some_data
1   A           19/12/1995  X
2   A           10/04/1997  Y
2   B           05/03/2013  Z
2   B           09/05/2017  W
2   B           09/05/2017  R
3   A           01/07/1998  M
3   B           09/08/2009  N'''

fileobj = pd.compat.StringIO(text)
df = pd.read_csv(fileobj, sep='\s+')

df['some_date'] = pd.to_datetime(df['some_date'])

m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)

df = df[m]

print(df)

Returns: 返回：

   id some_type  some_date some_data
0   1         A 1995-12-19         X
3   2         B 2017-09-05         W
4   2         B 2017-09-05         R
6   3         B 2009-09-08         N

熊猫为另一列的每个不同值选择具有某些列最大值的行

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-06-14 20:27:51

解决方案2
2 2018-06-14 20:30:35

熊猫为另一列的每个不同值选择具有某些列最大值的行

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-06-14 20:27:51

解决方案2 2 2018-06-14 20:30:35

解决方案1
2 已采纳 2018-06-14 20:27:51

解决方案2
2 2018-06-14 20:30:35