如何以我在第一行中获得最大数字、在第二行中获得最小数字、在第三行中获得第二大数字的方式对组进行排序，依此类推

Question

所以我有一个这样的df

In [1]:data= {'Group': ['A','A','A','A','A','A','B','B','B','B'],
    'Name': [ ' Sheldon Webb',' Traci Dean',' Chad Webster',' Ora Harmon',' Elijah Mendoza',' June Strickland',' Beth Vasquez',' Betty Sutton',' Joel Gill',' Vernon Stone'],
    'Performance':[33,64,142,116,122,68,95,127,132,80]}
In [2]:df = pd.DataFrame(data, columns = ['Group', 'Name','Performance'])

Out[1]:
    Group  Name             Performance
0    A     Sheldon Webb       33
1    A     Traci Dean         64
2    A     Chad Webster      142
3    A     Ora Harmon        116
4    A     Elijah Mendoza    122
5    A     June Strickland    68
6    B     Beth Vasquez       95
7    B     Betty Sutton      127
8    B     Joel Gill         132
9    B     Vernon Stone       80

我想以这样一种交替的方式对它进行排序，在一个组中，比如说“A”组，第一行应该有表现最好的人（在这种情况下是“Chad Webster”），然后在第二行中表现最差的人（这是“谢尔顿·韦伯”）。

我正在寻找的 output 看起来像这样：

Out[2]:
    Group   Name             Performance
0    A     Chad Webster      142
1    A     Sheldon Webb       33
2    A     Elijah Mendoza    122
3    A     Traci Dean         64
4    A     Ora Harmon        116
5    A     June Strickland    68
6    B     Joel Gill         132
7    B     Vernon Stone       80
8    B     Betty Sutton      127
9    B     Beth Vasquez       95

您可以看到该序列在组内的最高和最低之间交替。

Answer 1

采用排序顺序，然后对其应用二次 function ，其中根是数组长度的 1/2（加上一些小的偏移量）。 这样，最高等级被赋予极值（ eps偏移量的符号决定您是否想要一个高于最低值的最高值）。 我在最后添加了一个小组，以展示它如何正确处理重复值或奇数组大小。

def extremal_rank(s):
    eps = 10**-4
    y = (pd.Series(np.arange(1, len(s)+1), index=s.sort_values().index) 
         - (len(s)+1)/2 + eps)**2
    return y.reindex_like(s)
    
df['rnk'] = df.groupby('Group')['Performance'].apply(extremal_rank)
df = df.sort_values(['Group', 'rnk'], ascending=[True, False])

   Group              Name  Performance     rnk
2      A      Chad Webster          142  6.2505
0      A      Sheldon Webb           33  6.2495
4      A    Elijah Mendoza          122  2.2503
1      A        Traci Dean           64  2.2497
3      A        Ora Harmon          116  0.2501
5      A   June Strickland           68  0.2499
8      B         Joel Gill          132  2.2503
9      B      Vernon Stone           80  2.2497
7      B      Betty Sutton          127  0.2501
6      B      Beth Vasquez           95  0.2499
11     C                 b          110  9.0006
12     C                 c           68  8.9994
10     C                 a          110  4.0004
13     C                 d           68  3.9996
15     C                 f           70  1.0002
16     C                 g           70  0.9998
14     C                 e           70  0.0000

Answer 2

如果您在 Performace 上使用sort_values一次升序一次降序，则可以避免groupby ， concat两个排序的数据帧，然后使用sort_index和drop_duplicates获得预期的 output：

df_ = (pd.concat([df.sort_values(['Group', 'Performance'], ascending=[True, False])
                    .reset_index(), #need the original index for later drop_duplicates
                  df.sort_values(['Group', 'Performance'], ascending=[True, True])
                    .reset_index()
                    .set_index(np.arange(len(df))+0.5)], # for later sort_index
                 axis=0)
         .sort_index()
         .drop_duplicates('index', keep='first')
         .reset_index(drop=True)
       [['Group', 'Name', 'Performance']] 
      )
print(df_)
  Group              Name  Performance
0     A      Chad Webster          142
1     A      Sheldon Webb           33
2     A    Elijah Mendoza          122
3     A        Traci Dean           64
4     A        Ora Harmon          116
5     A   June Strickland           68
6     B         Joel Gill          132
7     B      Vernon Stone           80
8     B      Betty Sutton          127
9     B      Beth Vasquez           95

Answer 3

对每个组应用nlargest和nsmallest的排序连接：

>>> (df.groupby('Group')[df.columns[1:]]
      .apply(lambda x:
      pd.concat([x.nlargest(x.shape[0]//2,'Performance').reset_index(),
                 x.nsmallest(x.shape[0]-x.shape[0]//2,'Performance').reset_index()]
            )
            .sort_index()
            .drop('index',1))
      .reset_index().drop('level_1',1))

  Group              Name  Performance
0     A      Chad Webster          142
1     A      Sheldon Webb           33
2     A    Elijah Mendoza          122
3     A        Traci Dean           64
4     A        Ora Harmon          116
5     A   June Strickland           68
6     B         Joel Gill          132
7     B      Vernon Stone           80
8     B      Betty Sutton          127
9     B      Beth Vasquez           95

Answer 4

使用自定义 function 和np.empty的另一种方法：

def mysort(s):
    arr = s.to_numpy()
    c = np.empty(arr.shape, dtype=arr.dtype)
    idx = arr.shape[0]//2 if not arr.shape[0]%2 else arr.shape[0]//2+1
    c[0::2], c[1::2] = arr[:idx], arr[idx:][::-1]
    return pd.DataFrame(c, columns=s.columns)

print (df.sort_values("Performance", ascending=False).groupby("Group").apply(mysort))

        Group              Name Performance
Group                                      
A     0     A      Chad Webster         142
      1     A      Sheldon Webb          33
      2     A    Elijah Mendoza         122
      3     A        Traci Dean          64
      4     A        Ora Harmon         116
      5     A   June Strickland          68
B     0     B         Joel Gill         132
      1     B      Vernon Stone          80
      2     B      Betty Sutton         127
      3     B      Beth Vasquez          95

基准：

Answer 5

让我们尝试使用groupby().transform()检测min, max行，然后排序：

groups = df.groupby('Group')['Performance']
mins, maxs = groups.transform('min'), groups.transform('max')

(df.assign(temp=df['Performance'].eq(mins) | df['Performance'].eq(maxs))
   .sort_values(['Group','temp','Performance'],
                ascending=[True, False, False])
   .drop('temp', axis=1)
)

Output：

  Group              Name  Performance
2     A      Chad Webster          142
0     A      Sheldon Webb           33
4     A    Elijah Mendoza          122
3     A        Ora Harmon          116
5     A   June Strickland           68
1     A        Traci Dean           64
8     B         Joel Gill          132
9     B      Vernon Stone           80
7     B      Betty Sutton          127
6     B      Beth Vasquez           95

如何以我在第一行中获得最大数字、在第二行中获得最小数字、在第三行中获得第二大数字的方式对组进行排序，依此类推

问题描述

5 个解决方案

解决方案1
4 已采纳 2020-07-31 15:36:32

解决方案2
4 2020-07-31 15:39:36

解决方案3
3 2020-07-31 15:07:39

解决方案4
2 2020-07-31 17:17:51

解决方案5
1 2020-07-31 14:55:18

如何以我在第一行中获得最大数字、在第二行中获得最小数字、在第三行中获得第二大数字的方式对组进行排序，依此类推

问题描述

5 个解决方案

解决方案1 4 已采纳 2020-07-31 15:36:32

解决方案2 4 2020-07-31 15:39:36

解决方案3 3 2020-07-31 15:07:39

解决方案4 2 2020-07-31 17:17:51

解决方案5 1 2020-07-31 14:55:18

解决方案1
4 已采纳 2020-07-31 15:36:32

解决方案2
4 2020-07-31 15:39:36

解决方案3
3 2020-07-31 15:07:39

解决方案4
2 2020-07-31 17:17:51

解决方案5
1 2020-07-31 14:55:18