[英]pandas select rows with the max value of some columns for each different value of another column
I have a dataframe in pandas like this: 我在像这样的大熊猫中有一个数据框:
id some_type some_date some_data
0 1 A 19/12/1995 X
1 2 A 10/04/1997 Y
2 2 B 05/03/2013 Z
3 2 B 09/05/2017 W
4 2 B 09/05/2017 R
5 3 A 01/07/1998 M
6 3 B 09/08/2009 N
I need for each value of id, the rows that have the max value of some_type and some_date without deleting any value of some_data. 我需要id的每个值,具有some_type和some_date最大值的行而不删除some_data的任何值。
In other words, what I need is the following: 换句话说,我需要以下内容:
id some_type some_date some_data
0 1 A 19/12/1995 X
3 2 B 09/05/2017 W
4 2 B 09/05/2017 R
6 3 B 09/08/2009 N
you can do it with sort_values
, groupby
and apply
by keeping the rows with the last value some_type and some_date: 你可以做到这一点sort_values
, groupby
和apply
通过保持与最后的值some_type和some_date行:
df_output = (df.sort_values(by=['some_type','some_date']).groupby('id')
.apply(lambda df_g: df_g[(df_g['some_type'] == df_g['some_type'].iloc[-1]) &
(df_g['some_date'] == df_g['some_date'].iloc[-1])])
.reset_index(0,drop=True))
and the output is: 输出为:
id some_type some_date some_data
0 1 A 1995-12-19 X
3 2 B 2017-09-05 W
4 2 B 2017-09-05 R
6 3 B 2009-09-08 N
EDIT: if you don't care about the indexes, you can also use merge
: 编辑:如果您不在乎索引,也可以使用merge
:
#first get the last one once sorting
df_last = df.sort_values(['some_type','some_date']).groupby('id')['some_type','some_date'].last()
# now merge with inner to keep the one you want
df_output = df.merge(df_last ,how='inner')
you will get the same result besides indexes 除了索引,您将获得相同的结果
Create a mask with groupby
and max()
and apply. 使用groupby
和max()
创建一个遮罩并应用。 But first convert to datetime: 但首先转换为日期时间:
df['some_date'] = pd.to_datetime(df['some_date'])
m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)
df = df[m]
Full example: 完整示例:
import pandas as pd
text = '''\
id some_type some_date some_data
1 A 19/12/1995 X
2 A 10/04/1997 Y
2 B 05/03/2013 Z
2 B 09/05/2017 W
2 B 09/05/2017 R
3 A 01/07/1998 M
3 B 09/08/2009 N'''
fileobj = pd.compat.StringIO(text)
df = pd.read_csv(fileobj, sep='\s+')
df['some_date'] = pd.to_datetime(df['some_date'])
m = df.groupby('id')['some_type','some_date'].transform(lambda x: x == x.max()).all(1)
df = df[m]
print(df)
Returns: 返回:
id some_type some_date some_data
0 1 A 1995-12-19 X
3 2 B 2017-09-05 W
4 2 B 2017-09-05 R
6 3 B 2009-09-08 N
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.