簡體   English   中英

如何根據pandas中的連續和分配一個組

[英]How to assign a group based on consecutive sum in pandas

我有一個 dataframe 如下:

A         B          code cumul_sum
group1    group1_1    A     1
group1    group1_1    A     2
group1    group1_1    B     1
group1    group1_1    A     1
group1    group1_1    A     2
group1    group1_1    A     3 
group2    group2_1    A     1
group2    group2_1    A     2
group2    group2_1    A     3 

請假設 A 列和 B 列屬於同一類別。 我想分組,預期結果如下

A         B          code cumul_sum   **groupcat**
group1    group1_1    A     1          group1
group1    group1_1    A     2          group1
group1    group1_1    B     1          group2
group1    group1_1    A     1          group3
group1    group1_1    A     2          group3
group1    group1_1    A     3          group3
group2    group2_1    A     1          group1
group2    group2_1    A     2          group1
group2    group2_1    A     3          group1

由於中間出現了一個代碼 B,group1 沒有分配給第三行,然后下一個值是一個新的分類值。

編輯:還要為 A 組和 B 組重置 group-cat。

請指教

您可以修改以前的解決方案

#tested consecutive values by code column
s = df['code'].ne(df['code'].shift()).cumsum()
df['groupcat'] = 'group' + s.astype(str)

print (df)
        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3

#tested consecutive groups by A, B, code
s = df[['A','B','code']].ne(df[['A','B','code']].shift()).any(axis=1).cumsum()
df['groupcat'] = 'group' + s.astype(str)

print (df)
        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3

編輯:

s = df['code'].ne(df['code'].shift()).cumsum()


df['group'] = ('group' + df.assign(s=s)
                           .groupby(['A','B'])['s']
                           .rank('dense').astype(int).astype(str))
print (df)
        A         B code  cumul_sum   group
0  group1  group1_1    A          1  group1
1  group1  group1_1    A          2  group1
2  group1  group1_1    B          1  group2
3  group1  group1_1    A          1  group3
4  group1  group1_1    A          2  group3
5  group1  group1_1    A          3  group3
6  group2  group2_1    A          1  group1
7  group2  group2_1    A          2  group1
8  group2  group2_1    A          3  group1

替代解決方案:

s = (df[['A','B','code']].ne(df[['A','B','code']].shift()).any(axis=1)
        .groupby([df.A, df.B]).cumsum())
df['groupcat'] = 'group' + s.astype(str)

print (df)
        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3
6  group2  group2_1    A          1   group1
7  group2  group2_1    A          2   group1
8  group2  group2_1    A          3   group1

由於您的小組忽略了代碼,因此您只需重新啟動每個 1 值:

df['groupcat'] = df['cumul_sum'].eq(1).cumsum()

作為字符串:

df['groupcat'] = df['cumul_sum'].eq(1).cumsum().astype(str).radd('group')

output:

        A         B code  cumul_sum groupcat
0  group1  group1_1    A          1   group1
1  group1  group1_1    A          2   group1
2  group1  group1_1    B          1   group2
3  group1  group1_1    A          1   group3
4  group1  group1_1    A          2   group3
5  group1  group1_1    A          3   group3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM