Group by DataFrame based on consecutive ordered values

Question

I'm trying to group a dataframe based on order of values. Here is my sample code:

import pandas as pd

df = pd.DataFrame([{'c1': 'v1', 'c2': 1},
               {'c1': 'v1', 'c2': 2},
               {'c1': 'v2', 'c2': 3},
               {'c1': 'v1', 'c2': 4},
               {'c1': 'v2', 'c2': 5},
               {'c1': 'v2', 'c2': 6},
               {'c1': 'v3', 'c2': 7}])
df['test'] = 'test'
df1 = df.groupby(['test', 'c1'])['c2'].describe()[['min', 'max']]
print(df1)

here is the result:

         min  max
test c1          
test v1  1.0  4.0
     v2  3.0  6.0
     v3  7.0  7.0

but i'm looking for the possibility to get following result:

         min  max
test c1          
test v1  1.0  2.0
     v2  3.0  3.0
     v1  4.0  4.0
     v2  5.0  6.0
     v3  7.0  7.0

Answer 1

Use:

df1 = df.groupby(['test', 'c1', df.c1.ne(df.c1.shift()).cumsum()]).c2.describe()[['min', 'max']].droplevel(2)

result:

         min  max
test c1          
test v1  1.0  2.0
     v1  4.0  4.0
     v2  3.0  3.0
     v2  5.0  6.0
     v3  7.0  7.0

Note usage of pandas.MultiIndex.droplevel method at the end of transformations, which removes level from dataframe multiindex.

Answer 2

IIUC you need to group by consecutive c1 :

df1 = (df.assign(group=df["c1"].ne(df["c1"].shift()).cumsum())
         .groupby(['test', 'c1', "group"])['c2'].describe()[['min', 'max']]
         .sort_index(level=2))

print(df1)

               min  max
test c1 group          
test v1 1      1.0  2.0
     v2 2      3.0  3.0
     v1 3      4.0  4.0
     v2 4      5.0  6.0
     v3 5      7.0  7.0

Group by DataFrame based on consecutive ordered values

Question

2 answers

solution1
2 ACCPTED 2020-08-20 10:01:35

solution2
1 2020-08-20 09:59:28

Group by DataFrame based on consecutive ordered values

Question

2 answers

solution1 2 ACCPTED 2020-08-20 10:01:35

solution2 1 2020-08-20 09:59:28

solution1
2 ACCPTED 2020-08-20 10:01:35

solution2
1 2020-08-20 09:59:28