[英]Group by DataFrame based on consecutive ordered values
I'm trying to group a dataframe based on order of values.我正在尝试根据值的顺序对 dataframe 进行分组。 Here is my sample code:
这是我的示例代码:
import pandas as pd
df = pd.DataFrame([{'c1': 'v1', 'c2': 1},
{'c1': 'v1', 'c2': 2},
{'c1': 'v2', 'c2': 3},
{'c1': 'v1', 'c2': 4},
{'c1': 'v2', 'c2': 5},
{'c1': 'v2', 'c2': 6},
{'c1': 'v3', 'c2': 7}])
df['test'] = 'test'
df1 = df.groupby(['test', 'c1'])['c2'].describe()[['min', 'max']]
print(df1)
here is the result:这是结果:
min max
test c1
test v1 1.0 4.0
v2 3.0 6.0
v3 7.0 7.0
but i'm looking for the possibility to get following result:但我正在寻找获得以下结果的可能性:
min max
test c1
test v1 1.0 2.0
v2 3.0 3.0
v1 4.0 4.0
v2 5.0 6.0
v3 7.0 7.0
Use:采用:
df1 = df.groupby(['test', 'c1', df.c1.ne(df.c1.shift()).cumsum()]).c2.describe()[['min', 'max']].droplevel(2)
result:结果:
min max
test c1
test v1 1.0 2.0
v1 4.0 4.0
v2 3.0 3.0
v2 5.0 6.0
v3 7.0 7.0
Note usage of pandas.MultiIndex.droplevel method at the end of transformations, which removes level from dataframe multiindex.注意在转换结束时使用pandas.MultiIndex.droplevel方法,它从 dataframe 多索引中删除级别。
IIUC you need to group by consecutive c1
: IIUC 你需要按连续的
c1
分组:
df1 = (df.assign(group=df["c1"].ne(df["c1"].shift()).cumsum())
.groupby(['test', 'c1', "group"])['c2'].describe()[['min', 'max']]
.sort_index(level=2))
print(df1)
min max
test c1 group
test v1 1 1.0 2.0
v2 2 3.0 3.0
v1 3 4.0 4.0
v2 4 5.0 6.0
v3 5 7.0 7.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.