dataframe sort_values options

Question

trying the arrange the dataframe

genres = ['classic', 'pop', 'classic', 'classic', 'pop', 'pop', 'rock', 'rap' , 'k' , 'k']
plays = [500, 600, 150, 800, 2500, 700, 300, 10000, 300, 400]


import pandas as pd
df = pd.DataFrame({'genres' : genres,
                   'plays' : plays,
                   'num' : list(range(0,len(genres)))})

df
    genres  plays  num
0  classic    500    0
1      pop    600    1
2  classic    150    2
3  classic    800    3
4      pop   2500    4
5      pop    700    5
6     rock    300    6
7      rap  10000    7
8        k    300    8
9        k    400    9

dfg = df.groupby('genres', as_index = False).sum().sort_values(by = 'plays' , ascending = False)

dfg
    genres  plays  num
3      rap  10000    7
2      pop   3800   10
0  classic   1450    5
1        k    700   17
4     rock    300    6

dfg1 = df.sort_values(by=['genres','plays'], 
                      ascending = [False, False]).groupby('genres').head(2)

dfg1

    genres  plays  num
6     rock    300    6
7      rap  10000    7
4      pop   2500    4
5      pop    700    5
9        k    400    9
8        k    300    8
3  classic    800    3
0  classic    500    0

what i want for dfg1 is arranging by the sum of each genres' sum which is shown in dfg, and within each genre i want the biggest 2 values of plays.

However, the table shown above is arranged something weird, i guess the arranging is done by different groups of only 1 plays value, and 2 or more plays values, because 'rock' and 'rap' is always on the top of the table and after there are genres with 2 or more plays

    genres  plays  num
7      rap  10000    7
4      pop   2500    4
5      pop    700    5
3  classic    800    3
0  classic    500    0
9        k    400    9
8        k    300    8
6     rock    300    6

the above is the table i want, the arranging is done by sum of each group, and in each group i want 2 biggest plays value.

can anyone help please.

Answer 1

Use df.merge to comply to the order you want to sort in:

In [773]: x = df.groupby('genres')['plays'].nlargest(2).reset_index()
In [779]: dfg1 = dfg.merge(x, on='genres')[['genres', 'plays_y', 'level_1']].rename(columns={'level_1':'num', 'plays_y': 'plays'})

In [780]: dfg1
Out[780]: 
    genres  plays  num
0      rap  10000    7
1      pop   2500    4
2      pop    700    5
3  classic    800    3
4  classic    500    0
5        k    400    9
6        k    300    8
7     rock    300    6

dataframe sort_values options

Question

1 answers

solution1
1 ACCPTED 2021-02-09 06:32:40

dataframe sort_values options

Question

1 answers

solution1 1 ACCPTED 2021-02-09 06:32:40

solution1
1 ACCPTED 2021-02-09 06:32:40