I want to groupby
two column with the same value in 'A' and 'B' and make a cumsum
based on which column is located that value.
Example of grouped dataframe for a values
A B ValueA ValueB
0 b a 1 3
1 c a 2 2
2 a b 2 4
Now, if the value is in column 'A' consider ValueA to cumsum
if the value is in column 'B' consider ValueB to cumsum
EDIT: I would to perform shift().rolling()
just like cumsum
, I tried to put it after groupby but result is not correct.
Code
from numpy.core.numeric import NaN
df = pd.DataFrame({
'A' : ['b','c','a','c','a','c','b','c'],
'B': ['a', 'a', 'b', 'b','c','a','a','b'],
'ValueA':[1,2,2,1,2,4,7,1],
'ValueB':[3,2,4,3,1,2,4,5]
})
print(df)
df[['sumA','sumB']] = (
df[['ValueA','ValueB']].stack(dropna=False)
.groupby(df[['A','B']].stack().tolist())
.cumsum()
.unstack()
)
print(df)
#groupby(...).shift().rolling(2, min_periods=2).sum()
df['Expected_Shift_RollingA'] = [NaN,NaN,5,NaN,4,2,7,5]
df['Expected_Shift_RollingB'] = [NaN,NaN,NaN,5,3,4,4,10]
print(df)
You can do with stack
the values columns, use groupby
on the column AB stacked, then cumsum
and unstack
to be back original shape
df[['sumA','sumB']] = (
df[['ValueA','ValueB']].stack()
.groupby(df[['A','B']].stack().tolist())
.cumsum()
.unstack()
)
print(df)
A B ValueA ValueB sumA sumB
0 b a 1 3 1 3
1 c a 2 2 2 5
2 a b 2 4 7 5
3 c b 1 3 3 8
EDIT: after reviewing the original data, some Values are missing, so one need to adjust the above method.
By either filling the missing values by 0.
df[['ValueA','ValueB']].fillna(0).stack()
.groupby(...
or keeping the nan
while stack
so.
df[['ValueA','ValueB']].stack(dropna=False)
.groupby(...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.