簡體   English   中英

如何以我在第一行中獲得最大數字、在第二行中獲得最小數字、在第三行中獲得第二大數字的方式對組進行排序,依此類推

[英]How to sort a group in a way that I get the largest number in the first row and smallest in the second and the second largest in the third and so on

所以我有一個這樣的df

In [1]:data= {'Group': ['A','A','A','A','A','A','B','B','B','B'],
    'Name': [ ' Sheldon Webb',' Traci Dean',' Chad Webster',' Ora Harmon',' Elijah Mendoza',' June Strickland',' Beth Vasquez',' Betty Sutton',' Joel Gill',' Vernon Stone'],
    'Performance':[33,64,142,116,122,68,95,127,132,80]}
In [2]:df = pd.DataFrame(data, columns = ['Group', 'Name','Performance'])

Out[1]:
    Group  Name             Performance
0    A     Sheldon Webb       33
1    A     Traci Dean         64
2    A     Chad Webster      142
3    A     Ora Harmon        116
4    A     Elijah Mendoza    122
5    A     June Strickland    68
6    B     Beth Vasquez       95
7    B     Betty Sutton      127
8    B     Joel Gill         132
9    B     Vernon Stone       80

我想以這樣一種交替的方式對它進行排序,在一個組中,比如說“A”組,第一行應該有表現最好的人(在這種情況下是“Chad Webster”),然后在第二行中表現最差的人(這是“謝爾頓·韋伯”)。

我正在尋找的 output 看起來像這樣:

Out[2]:
    Group   Name             Performance
0    A     Chad Webster      142
1    A     Sheldon Webb       33
2    A     Elijah Mendoza    122
3    A     Traci Dean         64
4    A     Ora Harmon        116
5    A     June Strickland    68
6    B     Joel Gill         132
7    B     Vernon Stone       80
8    B     Betty Sutton      127
9    B     Beth Vasquez       95

您可以看到該序列在組內的最高和最低之間交替。

采用排序順序,然后對其應用二次 function ,其中根是數組長度的 1/2(加上一些小的偏移量)。 這樣,最高等級被賦予極值( eps偏移量的符號決定您是否想要一個高於最低值的最高值)。 我在最后添加了一個小組,以展示它如何正確處理重復值或奇數組大小。

def extremal_rank(s):
    eps = 10**-4
    y = (pd.Series(np.arange(1, len(s)+1), index=s.sort_values().index) 
         - (len(s)+1)/2 + eps)**2
    return y.reindex_like(s)
    
df['rnk'] = df.groupby('Group')['Performance'].apply(extremal_rank)
df = df.sort_values(['Group', 'rnk'], ascending=[True, False])

   Group              Name  Performance     rnk
2      A      Chad Webster          142  6.2505
0      A      Sheldon Webb           33  6.2495
4      A    Elijah Mendoza          122  2.2503
1      A        Traci Dean           64  2.2497
3      A        Ora Harmon          116  0.2501
5      A   June Strickland           68  0.2499
8      B         Joel Gill          132  2.2503
9      B      Vernon Stone           80  2.2497
7      B      Betty Sutton          127  0.2501
6      B      Beth Vasquez           95  0.2499
11     C                 b          110  9.0006
12     C                 c           68  8.9994
10     C                 a          110  4.0004
13     C                 d           68  3.9996
15     C                 f           70  1.0002
16     C                 g           70  0.9998
14     C                 e           70  0.0000

如果您在 Performace 上使用sort_values一次升序一次降序,則可以避免groupbyconcat兩個排序的數據幀,然后使用sort_indexdrop_duplicates獲得預期的 output:

df_ = (pd.concat([df.sort_values(['Group', 'Performance'], ascending=[True, False])
                    .reset_index(), #need the original index for later drop_duplicates
                  df.sort_values(['Group', 'Performance'], ascending=[True, True])
                    .reset_index()
                    .set_index(np.arange(len(df))+0.5)], # for later sort_index
                 axis=0)
         .sort_index()
         .drop_duplicates('index', keep='first')
         .reset_index(drop=True)
       [['Group', 'Name', 'Performance']] 
      )
print(df_)
  Group              Name  Performance
0     A      Chad Webster          142
1     A      Sheldon Webb           33
2     A    Elijah Mendoza          122
3     A        Traci Dean           64
4     A        Ora Harmon          116
5     A   June Strickland           68
6     B         Joel Gill          132
7     B      Vernon Stone           80
8     B      Betty Sutton          127
9     B      Beth Vasquez           95

對每個組應用nlargestnsmallest的排序連接:

>>> (df.groupby('Group')[df.columns[1:]]
      .apply(lambda x:
      pd.concat([x.nlargest(x.shape[0]//2,'Performance').reset_index(),
                 x.nsmallest(x.shape[0]-x.shape[0]//2,'Performance').reset_index()]
            )
            .sort_index()
            .drop('index',1))
      .reset_index().drop('level_1',1))

  Group              Name  Performance
0     A      Chad Webster          142
1     A      Sheldon Webb           33
2     A    Elijah Mendoza          122
3     A        Traci Dean           64
4     A        Ora Harmon          116
5     A   June Strickland           68
6     B         Joel Gill          132
7     B      Vernon Stone           80
8     B      Betty Sutton          127
9     B      Beth Vasquez           95

使用自定義 function 和np.empty的另一種方法:

def mysort(s):
    arr = s.to_numpy()
    c = np.empty(arr.shape, dtype=arr.dtype)
    idx = arr.shape[0]//2 if not arr.shape[0]%2 else arr.shape[0]//2+1
    c[0::2], c[1::2] = arr[:idx], arr[idx:][::-1]
    return pd.DataFrame(c, columns=s.columns)

print (df.sort_values("Performance", ascending=False).groupby("Group").apply(mysort))

        Group              Name Performance
Group                                      
A     0     A      Chad Webster         142
      1     A      Sheldon Webb          33
      2     A    Elijah Mendoza         122
      3     A        Traci Dean          64
      4     A        Ora Harmon         116
      5     A   June Strickland          68
B     0     B         Joel Gill         132
      1     B      Vernon Stone          80
      2     B      Betty Sutton         127
      3     B      Beth Vasquez          95

基准:

在此處輸入圖像描述

讓我們嘗試使用groupby().transform()檢測min, max行,然后排序:

groups = df.groupby('Group')['Performance']
mins, maxs = groups.transform('min'), groups.transform('max')

(df.assign(temp=df['Performance'].eq(mins) | df['Performance'].eq(maxs))
   .sort_values(['Group','temp','Performance'],
                ascending=[True, False, False])
   .drop('temp', axis=1)
)

Output:

  Group              Name  Performance
2     A      Chad Webster          142
0     A      Sheldon Webb           33
4     A    Elijah Mendoza          122
3     A        Ora Harmon          116
5     A   June Strickland           68
1     A        Traci Dean           64
8     B         Joel Gill          132
9     B      Vernon Stone           80
7     B      Betty Sutton          127
6     B      Beth Vasquez           95

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM