简体   繁体   中英

dataframe sort_values options

trying the arrange the dataframe

genres = ['classic', 'pop', 'classic', 'classic', 'pop', 'pop', 'rock', 'rap' , 'k' , 'k']
plays = [500, 600, 150, 800, 2500, 700, 300, 10000, 300, 400]


import pandas as pd
df = pd.DataFrame({'genres' : genres,
                   'plays' : plays,
                   'num' : list(range(0,len(genres)))})

df
    genres  plays  num
0  classic    500    0
1      pop    600    1
2  classic    150    2
3  classic    800    3
4      pop   2500    4
5      pop    700    5
6     rock    300    6
7      rap  10000    7
8        k    300    8
9        k    400    9
dfg = df.groupby('genres', as_index = False).sum().sort_values(by = 'plays' , ascending = False)

dfg
    genres  plays  num
3      rap  10000    7
2      pop   3800   10
0  classic   1450    5
1        k    700   17
4     rock    300    6
dfg1 = df.sort_values(by=['genres','plays'], 
                      ascending = [False, False]).groupby('genres').head(2)

dfg1

    genres  plays  num
6     rock    300    6
7      rap  10000    7
4      pop   2500    4
5      pop    700    5
9        k    400    9
8        k    300    8
3  classic    800    3
0  classic    500    0

what i want for dfg1 is arranging by the sum of each genres' sum which is shown in dfg, and within each genre i want the biggest 2 values of plays.

However, the table shown above is arranged something weird, i guess the arranging is done by different groups of only 1 plays value, and 2 or more plays values, because 'rock' and 'rap' is always on the top of the table and after there are genres with 2 or more plays

    genres  plays  num
7      rap  10000    7
4      pop   2500    4
5      pop    700    5
3  classic    800    3
0  classic    500    0
9        k    400    9
8        k    300    8
6     rock    300    6

the above is the table i want, the arranging is done by sum of each group, and in each group i want 2 biggest plays value.

can anyone help please.

Use df.merge to comply to the order you want to sort in:

In [773]: x = df.groupby('genres')['plays'].nlargest(2).reset_index()
In [779]: dfg1 = dfg.merge(x, on='genres')[['genres', 'plays_y', 'level_1']].rename(columns={'level_1':'num', 'plays_y': 'plays'})

In [780]: dfg1
Out[780]: 
    genres  plays  num
0      rap  10000    7
1      pop   2500    4
2      pop    700    5
3  classic    800    3
4  classic    500    0
5        k    400    9
6        k    300    8
7     rock    300    6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM