简体   繁体   中英

Sort by certain order (Situation: pandas DataFrame Groupby)

I want to change the day of order presented by below code.
What I want is a result with the order (Mon, Tue, Wed, Thu, Fri, Sat, Sun)
- should I say, sort by key in certain predefined order?


Here is my code which needs some tweak:

f8 = df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time'].sum()

print(f8)

Current result:

device_id                         day
device_112                        Thu     436518
                                  Wed     636451
                                  Fri     770307
                                  Tue     792066
                                  Mon     826862
                                  Sat     953503
                                  Sun    1019298
device_223                        Mon    2534895
                                  Thu    2857429
                                  Tue    3303173
                                  Fri    3548178
                                  Wed    3822616
                                  Sun    4213633
                                  Sat    4475221

Desired result:

device_id                         day
device_112                        Mon     826862  
                                  Tue     792066
                                  Wed     636451 
                                  Thu     436518
                                  Fri     770307
                                  Sat     953503
                                  Sun    1019298
device_223                        Mon    2534895
                                  Tue    3303173
                                  Wed    3822616
                                  Thu    2857429
                                  Fri    3548178
                                  Sat    4475221
                                  Sun    4213633

Here, type(df_toy_indoor2.groupby(['device_id', 'day'])['dwell_time']) is a class 'pandas.core.groupby.SeriesGroupBy'.

I have found .sort_values() , but it is a built-in sort function by values.
I want to get some pointers to set some order to use it further data manipulation.
Thanks in advance.

Took me some time, but I found the solution. reindex does what you want. See my code example:

a = [1, 2] * 2 + [2, 1] * 3 + [1, 2]
b = ['Mon', 'Wed', 'Thu', 'Fri'] * 3
c = list(range(12))
df = pd.DataFrame(data=[a,b,c]).T
df.columns = ['device', 'day', 'value']
df = df.groupby(['device', 'day']).sum()

gives:

            value
device day       
1      Fri      7
       Mon      0
       Thu     12
       Wed     14
2      Fri     14
       Mon     12
       Thu      6
       Wed      1

Then doing reindex:

df.reindex(['Mon', 'Wed', 'Thu', 'Fri'], level='day')

or more conveniently (credits to burhan)

df.reindex(list(calendar.day_abbr), level='day')

gives:

            value
device day       
1      Mon      0
       Wed     14
       Thu     12
       Fri      7
2      Mon     12
       Wed      1
       Thu      6
       Fri     14

Set the 'day' column as categorical dtype, just make sure when you set the category your list of days is sorted as you'd like it to be. Performing the groupby will then automatically sort it for you, but if you otherwise tried to sort the column it will sort in the correct order that you specify.

# Initial setup.
np.random.seed([3,1415])
n = 100
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
df = pd.DataFrame({
    'device_id': np.random.randint(1,3,n),
    'day': np.random.choice(days, n),
    'dwell_time':np.random.random(n)
    })


# Set as category, groupby, and sort.
df['day'] = df['day'].astype("category", categories=days, ordered=True)
df = df.groupby(['device_id', 'day']).sum()

Update : astype no longer accepts categories, use:

category_day = pd.api.types.CategoricalDtype(categories=days, ordered=True)
df['day'] = df['day'].astype(category_day)

The resulting output:

               dwell_time
device_id day            
1         Mon    4.428626
          Tue    3.259319
          Wed    2.436024
          Thu    0.909724
          Fri    4.974137
          Sat    5.583778
          Sun    2.687258
2         Mon    3.117923
          Tue    2.427154
          Wed    1.943927
          Thu    4.599547
          Fri    2.628887
          Sat    6.247520
          Sun    2.716886

Note that this method works for any type of customized sorting. For example, if you had a column with entries 'a', 'b', 'c' , and wanted it to be sorted in a non-standard order, eg 'c', 'a', 'b' , you'd just do the same type of procedure: specify the column as categorical with your categories being in the non-standard order you want.

Probably not the best way, but as far as I know you cannot pass a function/mapping to sort_values . As a workaround, I generally use assign to add a new column and sort by that column. In your example, that also requires resetting the index first (and setting it back).

days = {'Mon': 1, 'Tue': 2, 'Wed': 3, 'Thu': 4, 'Fri': 5, 'Sun': 6, 'Sat': 7}
f8 = f8.reset_index()
(f8.assign(day_num=f8['day'].map(days))
   .sort_values(['device_id', 'day_num'])
   .set_index(['device_id', 'day'])
   .drop('day_num', axis=1))
Out: 
                                            0
device_id                        day         
0d4fd55bb363bf6f6f7f8b3342cd0467 Mon   826862
                                 Tue   792066
                                 Wed   636451
                                 Thu   436518
                                 Fri   770307
                                 Sun  1019298
                                 Sat   953503
f6258edf9145d1c0404e6f3d7a27a29d Mon  2534895
                                 Tue  3303173
                                 Wed  3822616
                                 Thu  2857429
                                 Fri  3548178
                                 Sun  4213633
                                 Sat  4475221

If you sort the dataframe prior to the groupby , pandas will maintain the order of your sort. First thing you'll have to do is come up with a good way to sort the days of the week. One way of doing that is to assign an int representing the day of the week to each row, then sort on that column. For example:

import pandas

df = pandas.DataFrame(
    columns=['device_id', 'day', 'dwell_time'], 
    data=[[1, 'Wed', 35], [1, 'Mon', 63], [2, 'Sat', 83], [2, 'Fri', 82]]
)

df['day_of_week'] = df.apply(
    lambda x: ['Mon', 'Tues', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'].index(x.day), 
    1
)

print(df.sort(['device_id', 'day_of_week']).groupby(['device_id', 'day'])['dwell_time'].sum())

yields:

device_id  day    dwell_time
1          Mon    63
           Wed    35
2          Fri    82
           Sat    83

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM