简体   繁体   中英

How to sort a group in a way that I get the largest number in the first row and smallest in the second and the second largest in the third and so on

So I have a df like this

In [1]:data= {'Group': ['A','A','A','A','A','A','B','B','B','B'],
    'Name': [ ' Sheldon Webb',' Traci Dean',' Chad Webster',' Ora Harmon',' Elijah Mendoza',' June Strickland',' Beth Vasquez',' Betty Sutton',' Joel Gill',' Vernon Stone'],
    'Performance':[33,64,142,116,122,68,95,127,132,80]}
In [2]:df = pd.DataFrame(data, columns = ['Group', 'Name','Performance'])

Out[1]:
    Group  Name             Performance
0    A     Sheldon Webb       33
1    A     Traci Dean         64
2    A     Chad Webster      142
3    A     Ora Harmon        116
4    A     Elijah Mendoza    122
5    A     June Strickland    68
6    B     Beth Vasquez       95
7    B     Betty Sutton      127
8    B     Joel Gill         132
9    B     Vernon Stone       80

I want to sort it in such an alternating way that within a group, say group "A", the first row should have its highest performing person (in this case "Chad Webster") and then in the second row the least performing (which is "Sheldon Webb").

The output I am looking for would look something like this:

Out[2]:
    Group   Name             Performance
0    A     Chad Webster      142
1    A     Sheldon Webb       33
2    A     Elijah Mendoza    122
3    A     Traci Dean         64
4    A     Ora Harmon        116
5    A     June Strickland    68
6    B     Joel Gill         132
7    B     Vernon Stone       80
8    B     Betty Sutton      127
9    B     Beth Vasquez       95

You can see the sequence is alternating between the highest and lowest within a group.

Take the sorted order and then apply a quadratic function to it where the root is 1/2 the length of the array (plus some small offset). This way the highest rank is given to the extremal values (the sign of the eps offset determines whether you want a the highest value ranked above the lowest value). I added a small group at the end to show how it properly handles repeated values or an odd group size.

def extremal_rank(s):
    eps = 10**-4
    y = (pd.Series(np.arange(1, len(s)+1), index=s.sort_values().index) 
         - (len(s)+1)/2 + eps)**2
    return y.reindex_like(s)
    
df['rnk'] = df.groupby('Group')['Performance'].apply(extremal_rank)
df = df.sort_values(['Group', 'rnk'], ascending=[True, False])

   Group              Name  Performance     rnk
2      A      Chad Webster          142  6.2505
0      A      Sheldon Webb           33  6.2495
4      A    Elijah Mendoza          122  2.2503
1      A        Traci Dean           64  2.2497
3      A        Ora Harmon          116  0.2501
5      A   June Strickland           68  0.2499
8      B         Joel Gill          132  2.2503
9      B      Vernon Stone           80  2.2497
7      B      Betty Sutton          127  0.2501
6      B      Beth Vasquez           95  0.2499
11     C                 b          110  9.0006
12     C                 c           68  8.9994
10     C                 a          110  4.0004
13     C                 d           68  3.9996
15     C                 f           70  1.0002
16     C                 g           70  0.9998
14     C                 e           70  0.0000

You can avoid groupby if you use sort_values on Performace once ascending once descending, concat both sorted dataframes, then use sort_index and drop_duplicates to get the expected output:

df_ = (pd.concat([df.sort_values(['Group', 'Performance'], ascending=[True, False])
                    .reset_index(), #need the original index for later drop_duplicates
                  df.sort_values(['Group', 'Performance'], ascending=[True, True])
                    .reset_index()
                    .set_index(np.arange(len(df))+0.5)], # for later sort_index
                 axis=0)
         .sort_index()
         .drop_duplicates('index', keep='first')
         .reset_index(drop=True)
       [['Group', 'Name', 'Performance']] 
      )
print(df_)
  Group              Name  Performance
0     A      Chad Webster          142
1     A      Sheldon Webb           33
2     A    Elijah Mendoza          122
3     A        Traci Dean           64
4     A        Ora Harmon          116
5     A   June Strickland           68
6     B         Joel Gill          132
7     B      Vernon Stone           80
8     B      Betty Sutton          127
9     B      Beth Vasquez           95

Apply the sorted concatenation of nlargest and nsmallest for each group:

>>> (df.groupby('Group')[df.columns[1:]]
      .apply(lambda x:
      pd.concat([x.nlargest(x.shape[0]//2,'Performance').reset_index(),
                 x.nsmallest(x.shape[0]-x.shape[0]//2,'Performance').reset_index()]
            )
            .sort_index()
            .drop('index',1))
      .reset_index().drop('level_1',1))

  Group              Name  Performance
0     A      Chad Webster          142
1     A      Sheldon Webb           33
2     A    Elijah Mendoza          122
3     A        Traci Dean           64
4     A        Ora Harmon          116
5     A   June Strickland           68
6     B         Joel Gill          132
7     B      Vernon Stone           80
8     B      Betty Sutton          127
9     B      Beth Vasquez           95

Just another method using custom function with np.empty :

def mysort(s):
    arr = s.to_numpy()
    c = np.empty(arr.shape, dtype=arr.dtype)
    idx = arr.shape[0]//2 if not arr.shape[0]%2 else arr.shape[0]//2+1
    c[0::2], c[1::2] = arr[:idx], arr[idx:][::-1]
    return pd.DataFrame(c, columns=s.columns)

print (df.sort_values("Performance", ascending=False).groupby("Group").apply(mysort))

        Group              Name Performance
Group                                      
A     0     A      Chad Webster         142
      1     A      Sheldon Webb          33
      2     A    Elijah Mendoza         122
      3     A        Traci Dean          64
      4     A        Ora Harmon         116
      5     A   June Strickland          68
B     0     B         Joel Gill         132
      1     B      Vernon Stone          80
      2     B      Betty Sutton         127
      3     B      Beth Vasquez          95

Benchmark:

在此处输入图像描述

Let's try detecting the min, max rows with groupby().transform() , then sort:

groups = df.groupby('Group')['Performance']
mins, maxs = groups.transform('min'), groups.transform('max')

(df.assign(temp=df['Performance'].eq(mins) | df['Performance'].eq(maxs))
   .sort_values(['Group','temp','Performance'],
                ascending=[True, False, False])
   .drop('temp', axis=1)
)

Output:

  Group              Name  Performance
2     A      Chad Webster          142
0     A      Sheldon Webb           33
4     A    Elijah Mendoza          122
3     A        Ora Harmon          116
5     A   June Strickland           68
1     A        Traci Dean           64
8     B         Joel Gill          132
9     B      Vernon Stone           80
7     B      Betty Sutton          127
6     B      Beth Vasquez           95

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM