简体   繁体   English

基于pandas python中分组的第n个十分位数的虚拟

[英]Dummy based on nth decile by group in pandas python

I have a pandas dataframe like this:我有一个像这样的熊猫数据框:

import pandas as pd
df = {'Person' : ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'F', 'F', 'F', 'F', 'G', 'G', 'G', 'G', 'G', 'H', 'H', 'H', 'H', 'I', 'I', 'I', 'I', 'I', 'J', 'J', 'J', 'J', 'J', 'J', 'K', 'K', 'K', 'K', 'K', 'L', 'L','L'],
      'Score' : [18, 17, 15, 10, 11, 12, 15, 15, 16, 16, 16, 15, 18, 10, 12, 12, 8, 7, 10, 9, 5, 4, 2, 4, 10, 12, 11, 12, 10, 3, 1, 5, 6, 18, 19, 20, 16, 19, 10, 12, 11, 13, 10, 12, 20, 20, 20, 19, 19, 7, 12, 15], 
      'Group' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1]}
df = pd.DataFrame(df, columns = ['Person', 'Group', 'Score', 'Dummy'])
df

Therefore, I would like to create a dummy that takes the value of 1 when an individual score is higher than or equal to the 8th decile of the group, and zero otherwise.因此,我想创建一个虚拟对象,当单个分数高于或等于该组的第 8 个十分位数时取值为 1,否则取值为 0。 For instance, I can calculate the decile per group using:例如,我可以使用以下方法计算每组的十分位数:

df.groupby("Group")["Score"].quantile(0.8)

Group
1    15.0
2    19.2
3    12.0
Name: Score, dtype: float64

I want to create a new dummy variable that takes the value of 1 when the score of group 1 is higher than or equal to 15.0, the score of group 2 is higher than or equal to 19.2, and the score of group 3 is higher than or equal to 12.0, and zero otherwise.我想创建一个新的虚拟变量,当第 1 组的得分高于或等于 15.0,第 2 组的得分高于或等于 19.2,第 3 组的得分高于或等于 12.0,否则为零。

The outcome variable would therefore look like this:因此,结果变量将如下所示:

df = {'Person' : ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'F', 'F', 'F', 'F', 'G', 'G', 'G', 'G', 'G', 'H', 'H', 'H', 'H', 'I', 'I', 'I', 'I', 'I', 'J', 'J', 'J', 'J', 'J', 'J', 'K', 'K', 'K', 'K', 'K', 'L', 'L','L'],
      'Score' : [18, 17, 15, 10, 11, 12, 15, 15, 16, 16, 16, 15, 18, 10, 12, 12, 8, 7, 10, 9, 5, 4, 2, 4, 10, 12, 11, 12, 10, 3, 1, 5, 6, 18, 19, 20, 16, 19, 10, 12, 11, 13, 10, 12, 20, 20, 20, 19, 19, 7, 12, 15], 
      'Group' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1], 
      'Dummy' : [1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1]}
df = pd.DataFrame(df, columns = ['Person', 'Group', 'Score', 'Dummy'])
df

What would be the most direct way to do this?最直接的方法是什么?

This is just a map:这只是一张地图:

quantiles = df.groupby("Group")["Score"].quantile(0.8)

df['Dummy'] = (df['Score'] >= df['Group'].map(quantiles)).astype(int)

Output (head):输出(头):

   Person  Group  Score  Dummy
0       A      1     18      1
1       A      1     17      1
2       A      1     15      1
3       B      2     10      0
4       B      2     11      0
5       B      2     12      0
6       B      2     15      0
7       C      2     15      0
8       C      2     16      0
9       C      2     16      0
10      C      2     16      0

We can use transform here, to cast the quantile to each row:我们可以在这里使用transform ,将分位数转换为每一行:

q = df.groupby("Group")["Score"].transform('quantile', q=0.8)
df['Dummy'] = df['Score'].ge(q).astype(int)

print(df.head(10))
  Person  Group  Score  Dummy
0      A      1     18      1
1      A      1     17      1
2      A      1     15      1
3      B      2     10      0
4      B      2     11      0
5      B      2     12      0
6      B      2     15      0
7      C      2     15      0
8      C      2     16      0
9      C      2     16      0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM