Let's say I have a dataframe like this:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'a', 'b', 'b', 'c', 'c']})
print(df)
A B
0 1 a
1 2 a
2 3 b
3 4 b
4 5 c
5 6 c
How can I group by col B
such that the groups are a
, a OR b
and a OR b OR c
, rather than just a
, b
and c
? For the sake of the example, let's say that I want to aggregate the results by 'sum'
. I would then end up with:
A
a 3
a OR b 10
a OR b OR c 21
I think it really depends on the function you want to use. I can think of a trick with DataFrame.expanding
for example if you want to calculate the sum .The idea is that we can take advantage of the expansion and then only take into account the rows where entire groups have been selected with Series.where
df.expanding().sum().where(df['B'].ne(df['B'].shift(-1)))
A
0 NaN
1 3.0
2 NaN
3 10.0
4 NaN
5 21.0
df.expanding().sum().where(df['B'].ne(df['B'].shift(-1))).loc[lambda x: x.A.notna()]
A
1 3.0
3 10.0
5 21.0
UPDATED
We can also use DataFrame.groupby
+ DataFrame.expanding
df.groupby('B').sum().expanding().sum()
To get the expected output:
new_df = (df.groupby('B').sum().expanding().sum()
.reset_index()
.assign(B = lambda x: x.B.add(' or ').cumsum()
.str.rstrip(' or '))
.set_index('B') )
print(new_df)
A
B
a 3.0
a or b 10.0
a or b or c 21.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.