简体   繁体   English

Groupby pandas dataframe 具有相同值的两列

[英]Groupby pandas dataframe two columns with same value

I want to groupby two column with the same value in 'A' and 'B' and make a cumsum based on which column is located that value.我想对“A”和“B”中具有相同值的两列进行cumsum groupby

Example of grouped dataframe for a values值分组 dataframe 的示例

   A  B  ValueA  ValueB
0  b  a       1       3
1  c  a       2       2
2  a  b       2       4

Now, if the value is in column 'A' consider ValueA to cumsum if the value is in column 'B' consider ValueB to cumsum现在,如果该值在“A”列中,则考虑将 ValueA 设置为cumsum如果该值在“B”列中,则考虑将 ValueB 设置为cumsum

EDIT: I would to perform shift().rolling() just like cumsum , I tried to put it after groupby but result is not correct.编辑:我会像cumsum一样执行shift().rolling() ,我试着把它放在 groupby 之后,但结果不正确。

Code代码

from numpy.core.numeric import NaN

df = pd.DataFrame({
    'A' : ['b','c','a','c','a','c','b','c'],
    'B': ['a', 'a', 'b', 'b','c','a','a','b'],
    'ValueA':[1,2,2,1,2,4,7,1],
    'ValueB':[3,2,4,3,1,2,4,5]
})
print(df)

df[['sumA','sumB']] = (
    df[['ValueA','ValueB']].stack(dropna=False)
      .groupby(df[['A','B']].stack().tolist())
      .cumsum()
      .unstack()
)
print(df)

#groupby(...).shift().rolling(2, min_periods=2).sum()

df['Expected_Shift_RollingA'] = [NaN,NaN,5,NaN,4,2,7,5]
df['Expected_Shift_RollingB'] = [NaN,NaN,NaN,5,3,4,4,10]
print(df)

You can do with stack the values columns, use groupby on the column AB stacked, then cumsum and unstack to be back original shape您可以stack值列,在堆叠的 AB 列上使用groupby ,然后cumsumunstack返回原始形状

df[['sumA','sumB']] = (
    df[['ValueA','ValueB']].stack()
      .groupby(df[['A','B']].stack().tolist())
      .cumsum()
      .unstack()
)
print(df)
   A  B  ValueA  ValueB  sumA  sumB
0  b  a       1       3     1     3
1  c  a       2       2     2     5
2  a  b       2       4     7     5
3  c  b       1       3     3     8

EDIT: after reviewing the original data, some Values are missing, so one need to adjust the above method.编辑:查看原始数据后,缺少一些值,因此需要调整上述方法。

By either filling the missing values by 0.通过用 0 填充缺失值。

df[['ValueA','ValueB']].fillna(0).stack()
  .groupby(...

or keeping the nan while stack so.或者在stack时保持nan

df[['ValueA','ValueB']].stack(dropna=False)
  .groupby(...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM