[英]Pandas: Add flag using a groupby calculation
I want to first obtain the third quantiled grouped by (group and level in this example).我想首先获得按(本例中的组和级别)分组的第三个分位数。
d = pd.DataFrame({'customer': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'],
'group': ['A', 'B', 'B', 'A', 'B', 'A', 'B', 'A', 'B', 'A'],
'level': ['Z', 'X', 'X', 'X', 'Z', 'Z', 'Z', 'X', 'X', 'Z'],
'value': [0.4, 0.6, 0.7, 0.6, 0.3, 0.5, 0.2, 0.7, 0.5, 0.2]})
d.groupby(['group', 'level']).quantile(0.75)
Now that I have the quantile for each group.现在我有了每个组的分位数。 I want to add a column on the original df based on the groupby value.我想根据 groupby 值在原始 df 上添加一列。
0.75 value
group level
A X 0.67
Z 0.45
B X 0.65
Z 0.27
The result would be something like this where I'd add a new column based if the value is higher than the quantiled then I'll add 1, if it's lower then add a 0.结果将是这样的,如果值高于分位数,我将添加一个新列,然后我将添加 1,如果它较低则添加 0。
customer group level value new
1 A Z 0.40 1
2 B X 0.60 0
Thanks谢谢
IIUC: IUC:
d['new'] = (d.value > d.groupby(['group', 'level'])['value']
.transform('quantile', 0.75)).astype(int)
>>> d
customer group level value new
0 1 A Z 0.4 0
1 2 B X 0.6 0
2 3 B X 0.7 1
3 4 A X 0.6 0
4 5 B Z 0.3 1
5 6 A Z 0.5 1
6 7 B Z 0.2 0
7 8 A X 0.7 1
8 9 B X 0.5 0
9 10 A Z 0.2 0
Using only lt
and index matching仅使用lt
和索引匹配
q = d.groupby(['group', 'level']).quantile(0.75)
d.set_index(['group', 'level']).value.lt(q.value).astype(int)
group level
A X 1
X 0
Z 1
Z 0
Z 1
B X 1
X 0
X 1
Z 0
Z 1
Name: value, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.