简体   繁体   English

Python Pandas新的dataframe列有group by和condition

[英]Python Pandas new dataframe column with group by and condition

I have a Pandas dataframe that looks as follows. 我有一个Pandas数据框,如下所示。

player  count1  count2
A       1       1
A       2       1
A       3       1
A       4       2
A       5       2
B       1       1
B       2       2
B       3       2
B       4       2

Column player contains names, count1 is a cumulative sum and column count2 contains other counts. player包含名称, count1是累积总和,列count2包含其他计数。

I now want to create a new column that contains the value of count1 where the column count2 first contains the value 2 . 我现在想要创建一个包含count1值的新列,其中列count2首先包含值2

Hence, the result should look like this: 因此,结果应如下所示:

player  count1  count2  new
A       1       1       4
A       2       1       4
A       3       1       4
A       4       2       4
A       5       2       4
B       1       1       2
B       2       2       2
B       3       2       2
B       4       2       2

I tried to do it with transform , but I cannot figure out how to combine it with the condition based on the count2 column (and the tanking the value of the count1 column). 我尝试用transform来做,但我无法弄清楚如何将它与基于count2列的条件(以及count1列的值的坦克)结合起来。

Without the groupby it works like this, but I don't know where and how to add the groupby : 没有groupby就像这样,但我不知道在哪里以及如何添加groupby

df['new'] = df.loc[matches['count2'] == 2, 'count1'].min()

Use map by Series : Series使用map

s = df[df['count2'] == 2].drop_duplicates(['player']).set_index('player')['count1']

df['new'] = df['player'].map(s)
print (df)
  player  count1  count2  new
0      A       1       1    4
1      A       2       1    4
2      A       3       1    4
3      A       4       2    4
4      A       5       2    4
5      B       1       1    2
6      B       2       2    2
7      B       3       2    2
8      B       4       2    2

Detail : 细节

First filter only 2 rows by boolean indexing : 首先通过boolean indexing仅过滤2行:

print (df[df['count2'] == 2])
  player  count1  count2
3      A       4       2
4      A       5       2
6      B       2       2
7      B       3       2
8      B       4       2

And then remove dupes by player column by drop_duplicates : 然后通过drop_duplicatesplayer列删除欺骗:

print (df[df['count2'] == 2].drop_duplicates(['player']))
  player  count1  count2
3      A       4       2
6      B       2       2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM