I'm trying to group a dataframe based on order of values. Here is my sample code:
import pandas as pd
df = pd.DataFrame([{'c1': 'v1', 'c2': 1},
{'c1': 'v1', 'c2': 2},
{'c1': 'v2', 'c2': 3},
{'c1': 'v1', 'c2': 4},
{'c1': 'v2', 'c2': 5},
{'c1': 'v2', 'c2': 6},
{'c1': 'v3', 'c2': 7}])
df['test'] = 'test'
df1 = df.groupby(['test', 'c1'])['c2'].describe()[['min', 'max']]
print(df1)
here is the result:
min max
test c1
test v1 1.0 4.0
v2 3.0 6.0
v3 7.0 7.0
but i'm looking for the possibility to get following result:
min max
test c1
test v1 1.0 2.0
v2 3.0 3.0
v1 4.0 4.0
v2 5.0 6.0
v3 7.0 7.0
Use:
df1 = df.groupby(['test', 'c1', df.c1.ne(df.c1.shift()).cumsum()]).c2.describe()[['min', 'max']].droplevel(2)
result:
min max
test c1
test v1 1.0 2.0
v1 4.0 4.0
v2 3.0 3.0
v2 5.0 6.0
v3 7.0 7.0
Note usage of pandas.MultiIndex.droplevel method at the end of transformations, which removes level from dataframe multiindex.
IIUC you need to group by consecutive c1
:
df1 = (df.assign(group=df["c1"].ne(df["c1"].shift()).cumsum())
.groupby(['test', 'c1', "group"])['c2'].describe()[['min', 'max']]
.sort_index(level=2))
print(df1)
min max
test c1 group
test v1 1 1.0 2.0
v2 2 3.0 3.0
v1 3 4.0 4.0
v2 4 5.0 6.0
v3 5 7.0 7.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.