简体   繁体   中英

Rank by group after sorting in pandas

I have a dataframe which looks like this

pd.DataFrame({'A': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
   ...:                    'B': ['C1', 'C1', 'C1', 'C1', 'C2', 'C2', 'C2', 'C2'],
   ...:                    'X': [1, 2, 1, 2, 2, 3, 4, 5],
   ...:                    'Y': [2, 1, 2, 2, 7, 5, 7, 7],
   ...:                    'Z': [2, 1, 2, 1, 5, 8, 1, 9]})
Out[10]: 
   A   B  X  Y  Z
0  A  C1  1  2  2
1  B  C1  2  1  1
2  C  C1  1  2  2
3  D  C1  2  2  1
4  E  C2  2  7  5
5  F  C2  3  5  8
6  G  C2  4  7  1
7  H  C2  5  7  9

I need to sort the dataframe by columns B, X, Y, Z and then rank within each group of B.

Resulting dataframe should look like this.

Out[12]: 
   A   B  X  Y  Z   R
1  B  C1  2  1  1   1
3  D  C1  2  2  1   2
0  A  C1  1  2  2   3
2  C  C1  1  2  2   4
6  G  C2  4  7  1   1
5  F  C2  3  5  2   2
4  E  C2  2  1  5   3
7  H  C2  5  7  9   4

I know I can use df.sort_values(['B', 'Z', 'Y', 'X']) to bring in right order but struggling to apply the rank.

what is the 1 line of code for sorting and ranking?

You can use groupby().cumcount() :

df['R'] = df.sort_values(['B','X','Y','Z']).groupby('B').cumcount() + 1

Output:

   A   B  X  Y  Z  R
0  A  C1  1  2  2  3
1  B  C1  2  1  1  1
2  C  C1  1  2  2  4
3  D  C1  2  2  1  2
4  E  C2  2  7  5  2
5  F  C2  3  5  8  3
6  G  C2  4  7  1  1
7  H  C2  5  7  9  4

To match your output, separate sort_values and groupby() :

df = df.sort_values(['B','Z','Y','X'])
df['R'] = df.groupby('B').cumcount() + 1

Output:

   A   B  X  Y  Z  R
1  B  C1  2  1  1  1
3  D  C1  2  2  1  2
0  A  C1  1  2  2  3
2  C  C1  1  2  2  4
6  G  C2  4  7  1  1
4  E  C2  2  7  5  2
5  F  C2  3  5  8  3
7  H  C2  5  7  9  4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM