简体   繁体   中英

Pandas category group by category sorting

I need to be able sort the result of Pandas' 2nd groupby by Category.

The first groupby creates a list from another column, and second one is the groupby result I need. The problem is that the 2nd groupby does not honour the original sorted categorical index of the Dataframe

import pandas as pd
import numpy  as np
import numpy.ma as ma
from   pathlib import Path

fr   = Path('../data/rules-1.xlsx')
df   = pd.read_excel(fr, sheet_name='MS')
from pandas.api.types import CategoricalDtype

print('Before:')
display(df)
ms_cat         = ['Parent-C', 'Parent-A', 'Parent-B']
df['ParentMS'] = df['ParentMS'].astype(CategoricalDtype(list(ms_cat)),order=True)
df             = df.reset_index()
df             = df.set_index('ParentMS')
df             = df.sort_index()
print('After:')
display(df)

df_g           = df.  groupby(['ParentMS', 'Milestone'])['Tasks'].apply(list)
df_g           = df_g.groupby('ParentMS')

# Category sort is not honored after the second groupby()
for name, group in df_g:
    print(name, group)

This the input file:
[enter image description here][1]


  [1]: https://i.stack.imgur.com/KZnZD.png

Combining the two "df_g" lines did the trick for me. I cannot explain it but it worked

df_g = df.groupby(['ParentMS', 'Milestone'])['RN'].apply(list).groupby('ParentMS')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM