对 pandas dataframe 中的值变化的值求和

Question

I have a pandas data frame that looks like this:我有一个看起来像这样的 pandas 数据框：

 Count Status Date 2021-01-01 11 1 2021-01-02 13 1 2021-01-03 14 1 2021-01-04 8 0 2021-01-05 8 0 2021-01-06 5 0 2021-01-07 2 0 2021-01-08 6 1 2021-01-09 8 1 2021-01-10 10 0

I want to calculate the difference between the initial and final value of the "Count" column before the "Status" column changes from 0 to 1 or vice-versa (for every cycle) and make a new dataframe out of these values.我想在“状态”列从 0 变为 1 或反之亦然（对于每个周期）之前计算“计数”列的初始值和最终值之间的差异，并从这些值中创建一个新的 dataframe。

The output for this example would be:此示例的 output 将是：

 Cycle Difference 1 3 2 -6 3 2

Answer 1

You can use a GroupBy.agg on the groups formed of the consecutive values, then get the first minus last value (see below for variants):您可以在由连续值组成的组上使用GroupBy.agg ，然后获取第一个减去最后一个值（请参阅下面的变体）：

 out = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum()) ['Count'].agg(lambda x: x.iloc[-1]-x.iloc[0]) )

output: output：

 Status 1 3 2 -6 3 2 4 0 Name: Count, dtype: int64

If you only want to do this for groups of more than one element:如果您只想对包含多个元素的组执行此操作：

 out = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum()) ['Count'].agg(lambda x: x.iloc[-1]-x.iloc[0] if len(x)>1 else pd.NA).dropna() )

output: output：

 Status 1 3 2 -6 3 2 Name: Count, dtype: object

output as DataFrame: output 为 DataFrame：

add .rename_axis('Cycle').reset_index(name='Difference') :添加.rename_axis('Cycle').reset_index(name='Difference') ：

 out = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum()) ['Count'].agg(lambda x: x.iloc[-1]-x.iloc[0] if len(x)>1 else pd.NA).dropna().rename_axis('Cycle').reset_index(name='Difference') )

output: output：

 Cycle Difference 0 1 3 1 2 -6 2 3 2

Answer 2

Use GroupBy.agg by consecutive groups created by comapre shifted values with cumulative sum, last subtract last and first value:使用GroupBy.agg通过由具有累积和的comapre移位值创建的连续组，最后减去最后一个和第一个值：

 df = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum().rename('Cycle'))['Count'].agg(['first','last']).eval('last - first').reset_index(name='Difference')) print (df) Cycle Difference 0 1 3 1 2 -6 2 3 2 3 4 0

If need filter out groups rows with 1 row is possible add aggregation GroupBy.size and then filter oupt rows by DataFrame.loc :如果需要过滤掉具有 1 行的行，可以添加聚合GroupBy.size ，然后按DataFrame.loc过滤输出行：

 df = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum().rename('Cycle'))['Count'].agg(['first','last', 'size']).loc[lambda x: x['size'] > 1].eval('last - first').reset_index(name='Difference')) print (df) Cycle Difference 0 1 3 1 2 -6 2 3 2

对 pandas dataframe 中的值变化的值求和

问题描述

2 个解决方案

解决方案1
3 2022-07-04 11:43:34

output as DataFrame: output 为 DataFrame：

解决方案2
2 已采纳 2022-07-04 11:43:45

对 pandas dataframe 中的值变化的值求和

问题描述

2 个解决方案

解决方案1 3 2022-07-04 11:43:34

output as DataFrame: output 为 DataFrame：

解决方案2 2 已采纳 2022-07-04 11:43:45

解决方案1
3 2022-07-04 11:43:34

解决方案2
2 已采纳 2022-07-04 11:43:45