[英]Pandas : Adding new column depending on a group aggregation
I am a newbie with Pandas data frame and I need some help. 我是Pandas数据框的新手,我需要一些帮助。
Let's say that I have a DataFrame df : 假设我有一个DataFrame df:
>>print(df)
ID Score
0 AA 100
1 AA 10
2 BB 50
3 BB -20
4 BB 0
5 AA 200
And I want to add a new column with value = 1. If it is the lowest score for the ID, and 0 else : 我想添加一个值为1的新列。如果它是ID的最低分数,则为0:
>> print(df_out)
ID Score IsLowestScoreID
0 AA 100 0
1 AA 10 1
2 BB 50 0
3 BB -20 1
4 BB 0 0
5 AA 200 0
What is the correct way to achieve such a thing? 实现这样的事情的正确方法是什么?
You can compare by Series.eq
column Score
with Series
created by transform
and min
, then cast mask to integer for True/False
to 1/0
mapping: 您可以通过Series.eq
列Score
与由transform
和min
创建的Series
进行比较,然后将掩码转换为整数,用于True/False
到1/0
映射:
df['IsLowestScoreID'] = df['Score'].eq(df.groupby('ID')['Score'].transform('min')).astype(int)
print (df)
ID Score IsLowestScoreID
0 AA 100 0
1 AA 10 1
2 BB 50 0
3 BB -20 1
4 BB 0 0
5 AA 200 0
Alternative is use numpy.where
for specify values by mask: 替代方法是使用numpy.where
通过掩码指定值:
mask = df['Score'].eq(df.groupby('ID')['Score'].transform('min'))
df['IsLowestScoreID'] = np.where(mask, 1, 0)
Detail : 细节 :
print (df.groupby('ID')['Score'].transform('min'))
0 10
1 10
2 -20
3 -20
4 -20
5 10
Name: Score, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.