Pandas获得每组首次出现条件的列值

Question

I have a pandas dataframe as follows: 我有一个pandas数据帧如下：

player  condition   num
A       0           1
A       1           2
A       1           3
B       0           1
B       0           2
B       1           3
B       0           4

I want to add a column that stores the minimum value of the num column where per player the condition column is 1 . 我想添加一个列，该列存储num列的最小值，其中每个 player的condition列为1 。

The result, hence, should look like this: 因此，结果应如下所示：

player  condition   num  numCondition
A       0           1    2
A       1           2    2
A       1           3    2
B       0           1    3
B       0           2    3
B       1           3    3
B       0           4    3

I know that I need a groupBy() per player . 我知道每个player需要一个groupBy() 。 I will then need an apply() with probably using the lambda() function. 然后我需要一个apply() ，可能使用lambda()函数。 But I could not fit the pieces together, yet. 但是，我无法将各个部分组合在一起。

EDIT: The condition column is a simplification in my example. 编辑： condition列是我的示例中的简化。 In reality it should simply be possible to use the usual pandas dataframe queries to filter the rows. 实际上，应该可以使用通常的pandas数据帧查询来过滤行。 Eg df[(df.condition == 1) & (df.otherCondition > 10)] 例如df[(df.condition == 1) & (df.otherCondition > 10)]

Answer 1

By using drop_duplicates 通过使用drop_duplicates

df.player.map(df[df.condition==1].drop_duplicates(['player'],keep='first').set_index('player').num)
    Out[221]: 
    0    2
    1    2
    2    2
    3    3
    4    3
    5    3
    6    3
    Name: player, dtype: int64

df['numCondition']=df.player.map(df[df.condition==1].drop_duplicates(['player'],keep='first').set_index('player').num)
df
Out[223]: 
  player  condition  num  numCondition
0      A          0    1             2
1      A          1    2             2
2      A          1    3             2
3      B          0    1             3
4      B          0    2             3
5      B          1    3             3
6      B          0    4             3

Answer 2

Aggregate firstly and then join back with df on player : 首先聚合，然后在player上与df ：

df.join(
    df.groupby('player')
      .apply(lambda g: g.num[g.condition == 1].min())
      .rename('numCondition'), 
on='player')

# player  condition   num   numCondition
#0     A          0     1   2
#1     A          1     2   2
#2     A          1     3   2
#3     B          0     1   3
#4     B          0     2   3
#5     B          1     3   3
#6     B          0     4   3

Pandas获得每组首次出现条件的列值

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-10-07 19:50:19

解决方案2
1 2017-10-07 19:37:55

Pandas获得每组首次出现条件的列值

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-10-07 19:50:19

解决方案2 1 2017-10-07 19:37:55

解决方案1
2 已采纳 2017-10-07 19:50:19

解决方案2
1 2017-10-07 19:37:55