简体   繁体   English

pandas df 列的子集上的 cumsum

[英]cumsum on subset of pandas df columns

I have a pandas dataframe as follows:我有一个 pandas dataframe 如下:

Date       Week    Value1   Value2   Value3

2022-01-01    1     -10       20       30
2022-01-02    1      -5        25       20
2022-01-03    1     0         15       NaN
2022-01-04    1     5         7        10
2022-01-05    1     7         10       15
2022-01-06    1    10        5       NaN

I am looking to perform a cumulative sum such that the resulting DF is as follows我正在寻找执行累积和,使得结果 DF 如下

Date        Week Value1   Value2   Value3
2022-01-03   1    -15       60       50
2022-01-05   1    22       22       25

Essentially Value3 has NaN values.本质上Value3具有NaN值。 No other column has it.没有其他专栏有它。 I am looking to total up all values for the 3 Value columns between each NaN encountered in Value3 .我希望汇总Value3中遇到的每个NaN之间的 3 个Value列的所有值。 I am also looking to keep Date and Week of the row where I encountered the NaN value as is (so cumsum is applied only to Value columns) I have tried so far (some variations of the below) but w/o success.我还希望保持遇到NaN值的行的DateWeek原样(因此 cumsum 仅适用于值列)我到目前为止尝试过(以下的一些变体)但没有成功。

df.groupby(['Date','Week'])['Value1', 'Value2','Value3'].apply(lambda x: x.isna().cumsum().reset_index(drop=True))

But havent got the desired result using this.但是使用它还没有得到想要的结果。 Any ideas on how this can be achieved?关于如何实现这一点的任何想法? Thanks!谢谢!

We use a greoupby on a cumulative number of NaNs in Value3:我们对 Value3 中 NaN 的累积数量使用 greoupby:

df.groupby(df['Value3'].shift().isna().cumsum()).agg({'Date':'last', 'Week':'last', 'Value1':'sum', 'Value2':'sum', 'Value3':'sum'}).reset_index(drop = True)

output: output:


    Date       Week Value1  Value2  Value3
0   2022-01-03  1   -15     60      50.0
1   2022-01-06  1   22      22      25.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM