[英]How to assign a group based on consecutive sum in pandas
我有一個 dataframe 如下:
A B code cumul_sum
group1 group1_1 A 1
group1 group1_1 A 2
group1 group1_1 B 1
group1 group1_1 A 1
group1 group1_1 A 2
group1 group1_1 A 3
group2 group2_1 A 1
group2 group2_1 A 2
group2 group2_1 A 3
請假設 A 列和 B 列屬於同一類別。 我想分組,預期結果如下
A B code cumul_sum **groupcat**
group1 group1_1 A 1 group1
group1 group1_1 A 2 group1
group1 group1_1 B 1 group2
group1 group1_1 A 1 group3
group1 group1_1 A 2 group3
group1 group1_1 A 3 group3
group2 group2_1 A 1 group1
group2 group2_1 A 2 group1
group2 group2_1 A 3 group1
由於中間出現了一個代碼 B,group1 沒有分配給第三行,然后下一個值是一個新的分類值。
編輯:還要為 A 組和 B 組重置 group-cat。
請指教
您可以修改以前的解決方案:
#tested consecutive values by code column
s = df['code'].ne(df['code'].shift()).cumsum()
df['groupcat'] = 'group' + s.astype(str)
print (df)
A B code cumul_sum groupcat
0 group1 group1_1 A 1 group1
1 group1 group1_1 A 2 group1
2 group1 group1_1 B 1 group2
3 group1 group1_1 A 1 group3
4 group1 group1_1 A 2 group3
5 group1 group1_1 A 3 group3
#tested consecutive groups by A, B, code
s = df[['A','B','code']].ne(df[['A','B','code']].shift()).any(axis=1).cumsum()
df['groupcat'] = 'group' + s.astype(str)
print (df)
A B code cumul_sum groupcat
0 group1 group1_1 A 1 group1
1 group1 group1_1 A 2 group1
2 group1 group1_1 B 1 group2
3 group1 group1_1 A 1 group3
4 group1 group1_1 A 2 group3
5 group1 group1_1 A 3 group3
編輯:
s = df['code'].ne(df['code'].shift()).cumsum()
df['group'] = ('group' + df.assign(s=s)
.groupby(['A','B'])['s']
.rank('dense').astype(int).astype(str))
print (df)
A B code cumul_sum group
0 group1 group1_1 A 1 group1
1 group1 group1_1 A 2 group1
2 group1 group1_1 B 1 group2
3 group1 group1_1 A 1 group3
4 group1 group1_1 A 2 group3
5 group1 group1_1 A 3 group3
6 group2 group2_1 A 1 group1
7 group2 group2_1 A 2 group1
8 group2 group2_1 A 3 group1
替代解決方案:
s = (df[['A','B','code']].ne(df[['A','B','code']].shift()).any(axis=1)
.groupby([df.A, df.B]).cumsum())
df['groupcat'] = 'group' + s.astype(str)
print (df)
A B code cumul_sum groupcat
0 group1 group1_1 A 1 group1
1 group1 group1_1 A 2 group1
2 group1 group1_1 B 1 group2
3 group1 group1_1 A 1 group3
4 group1 group1_1 A 2 group3
5 group1 group1_1 A 3 group3
6 group2 group2_1 A 1 group1
7 group2 group2_1 A 2 group1
8 group2 group2_1 A 3 group1
由於您的小組忽略了代碼,因此您只需重新啟動每個 1 值:
df['groupcat'] = df['cumul_sum'].eq(1).cumsum()
作為字符串:
df['groupcat'] = df['cumul_sum'].eq(1).cumsum().astype(str).radd('group')
output:
A B code cumul_sum groupcat
0 group1 group1_1 A 1 group1
1 group1 group1_1 A 2 group1
2 group1 group1_1 B 1 group2
3 group1 group1_1 A 1 group3
4 group1 group1_1 A 2 group3
5 group1 group1_1 A 3 group3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.