简体   繁体   中英

Group by DataFrame based on consecutive ordered values

I'm trying to group a dataframe based on order of values. Here is my sample code:

import pandas as pd

df = pd.DataFrame([{'c1': 'v1', 'c2': 1},
               {'c1': 'v1', 'c2': 2},
               {'c1': 'v2', 'c2': 3},
               {'c1': 'v1', 'c2': 4},
               {'c1': 'v2', 'c2': 5},
               {'c1': 'v2', 'c2': 6},
               {'c1': 'v3', 'c2': 7}])
df['test'] = 'test'
df1 = df.groupby(['test', 'c1'])['c2'].describe()[['min', 'max']]
print(df1)

here is the result:

         min  max
test c1          
test v1  1.0  4.0
     v2  3.0  6.0
     v3  7.0  7.0

but i'm looking for the possibility to get following result:

         min  max
test c1          
test v1  1.0  2.0
     v2  3.0  3.0
     v1  4.0  4.0
     v2  5.0  6.0
     v3  7.0  7.0

Use:

df1 = df.groupby(['test', 'c1', df.c1.ne(df.c1.shift()).cumsum()]).c2.describe()[['min', 'max']].droplevel(2)

result:

         min  max
test c1          
test v1  1.0  2.0
     v1  4.0  4.0
     v2  3.0  3.0
     v2  5.0  6.0
     v3  7.0  7.0

Note usage of pandas.MultiIndex.droplevel method at the end of transformations, which removes level from dataframe multiindex.

IIUC you need to group by consecutive c1 :

df1 = (df.assign(group=df["c1"].ne(df["c1"].shift()).cumsum())
         .groupby(['test', 'c1', "group"])['c2'].describe()[['min', 'max']]
         .sort_index(level=2))

print(df1)

               min  max
test c1 group          
test v1 1      1.0  2.0
     v2 2      3.0  3.0
     v1 3      4.0  4.0
     v2 4      5.0  6.0
     v3 5      7.0  7.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM