[英]Pandas get column value of first occurrence of condition per group
I have a pandas dataframe as follows: 我有一个pandas数据帧如下:
player condition num
A 0 1
A 1 2
A 1 3
B 0 1
B 0 2
B 1 3
B 0 4
I want to add a column that stores the minimum value of the num
column where per player
the condition
column is 1 . 我想添加一个列,该列存储
num
列的最小值,其中每个 player
的condition
列为1 。
The result, hence, should look like this: 因此,结果应如下所示:
player condition num numCondition
A 0 1 2
A 1 2 2
A 1 3 2
B 0 1 3
B 0 2 3
B 1 3 3
B 0 4 3
I know that I need a groupBy()
per player
. 我知道每个
player
需要一个groupBy()
。 I will then need an apply()
with probably using the lambda()
function. 然后我需要一个
apply()
,可能使用lambda()
函数。 But I could not fit the pieces together, yet. 但是,我无法将各个部分组合在一起。
EDIT: The condition
column is a simplification in my example. 编辑:
condition
列是我的示例中的简化。 In reality it should simply be possible to use the usual pandas dataframe queries to filter the rows. 实际上,应该可以使用通常的pandas数据帧查询来过滤行。 Eg
df[(df.condition == 1) & (df.otherCondition > 10)]
例如
df[(df.condition == 1) & (df.otherCondition > 10)]
By using drop_duplicates
通过使用
drop_duplicates
df.player.map(df[df.condition==1].drop_duplicates(['player'],keep='first').set_index('player').num)
Out[221]:
0 2
1 2
2 2
3 3
4 3
5 3
6 3
Name: player, dtype: int64
df['numCondition']=df.player.map(df[df.condition==1].drop_duplicates(['player'],keep='first').set_index('player').num)
df
Out[223]:
player condition num numCondition
0 A 0 1 2
1 A 1 2 2
2 A 1 3 2
3 B 0 1 3
4 B 0 2 3
5 B 1 3 3
6 B 0 4 3
Aggregate firstly and then join back with df
on player
: 首先聚合,然后在
player
上与df
:
df.join(
df.groupby('player')
.apply(lambda g: g.num[g.condition == 1].min())
.rename('numCondition'),
on='player')
# player condition num numCondition
#0 A 0 1 2
#1 A 1 2 2
#2 A 1 3 2
#3 B 0 1 3
#4 B 0 2 3
#5 B 1 3 3
#6 B 0 4 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.