简体   繁体   中英

Pandas groupby custom groups

Let's say I have a dataframe like this:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'a', 'b', 'b', 'c', 'c']})
print(df)

   A  B
0  1  a
1  2  a
2  3  b
3  4  b
4  5  c
5  6  c

How can I group by col B such that the groups are a , a OR b and a OR b OR c , rather than just a , b and c ? For the sake of the example, let's say that I want to aggregate the results by 'sum' . I would then end up with:

              A
a             3
a OR b        10 
a OR b OR c   21

I think it really depends on the function you want to use. I can think of a trick with DataFrame.expanding for example if you want to calculate the sum .The idea is that we can take advantage of the expansion and then only take into account the rows where entire groups have been selected with Series.where

df.expanding().sum().where(df['B'].ne(df['B'].shift(-1)))
      A
0   NaN
1   3.0
2   NaN
3  10.0
4   NaN
5  21.0

df.expanding().sum().where(df['B'].ne(df['B'].shift(-1))).loc[lambda x: x.A.notna()]

      A
1   3.0
3  10.0
5  21.0

UPDATED

We can also use DataFrame.groupby + DataFrame.expanding

df.groupby('B').sum().expanding().sum()

To get the expected output:

new_df = (df.groupby('B').sum().expanding().sum()
            .reset_index()
            .assign(B = lambda x: x.B.add(' or ').cumsum()
                                  .str.rstrip(' or '))
            .set_index('B') )
print(new_df)
                A
B                
a             3.0
a or b       10.0
a or b or c  21.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM