简体   繁体   中英

Using pandas.qcut within a groupby with a different number of classes for each key

I am facing a wall using pd.qcut within a groupby.transform() routine.

I want to assign a class number depending on the quantiles of an AGE variable within a groupby (by some key). So i thought of using something like this

df['class'] = df.groupby('key')['AGE'].transform(pd.qcut, number_of_classes)

My problem is that the "number_of_classes" is different depending on my variable 'key' (of course...) I found a way to deal with it but it's very not efficient as you can see from yourself:

for i in df['key'].unique():
    df_temp = df.loc[df.key == i].copy()
    nbclass = int(df_temp['number_of_classes'].max())
    age_class = df.groupby('key')['AGE'].transform(pd.qcut, nbclass, labels=False)
    idx = df_temp.index.values
    df.loc[idx, 'class'] = age_class

Do you think it's possible to use a pandas routine to achieve that without spending a billion years on a loop?

Many thanks:))

ps: I'm very sorry if some of you cry while seeing my bad coding

One idea is use custom function in GroupBy.apply , I hope it is faster like a billion years:

df = pd.DataFrame({'key':'foo foo foo bar bar bar'.split(),
                   'AGE':[0.1, 0.5, 1.0]*2,
                   'number_of_classes':[2,5,3,1,4,2]})

def func(x):
    nbclass = int(x['number_of_classes'].max())
    x['class'] = pd.qcut(x['AGE'], nbclass, labels=False)
    return x
        
df = df.groupby('key').apply(func)        
print (df)
   key  AGE  number_of_classes  class
0  foo  0.1                  2      0
1  foo  0.5                  5      2
2  foo  1.0                  3      4
3  bar  0.1                  1      0
4  bar  0.5                  4      1
5  bar  1.0                  2      3

Here is solution with GroupBy.transform and passign max values by Series by aggregate max :

m = df.groupby('key')['number_of_classes'].max().astype(int)
print (m)  
key
bar    4
foo    5
Name: number_of_classes, dtype: int64

f = lambda x: pd.qcut(x, m[x.name], labels=False)
df['class1'] = df.groupby('key')['AGE'].transform(f)
print (df)
   key  AGE  number_of_classes  class  class1
0  foo  0.1                  2      0       0
1  foo  0.5                  5      2       2
2  foo  1.0                  3      4       4
3  bar  0.1                  1      0       0
4  bar  0.5                  4      1       1
5  bar  1.0                  2      3       3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM