简体   繁体   English

对 pandas dataframe 中的值变化的值求和

[英]Summing values up to a column value change in pandas dataframe

I have a pandas data frame that looks like this:我有一个看起来像这样的 pandas 数据框:

 Count Status Date 2021-01-01 11 1 2021-01-02 13 1 2021-01-03 14 1 2021-01-04 8 0 2021-01-05 8 0 2021-01-06 5 0 2021-01-07 2 0 2021-01-08 6 1 2021-01-09 8 1 2021-01-10 10 0

I want to calculate the difference between the initial and final value of the "Count" column before the "Status" column changes from 0 to 1 or vice-versa (for every cycle) and make a new dataframe out of these values.我想在“状态”列从 0 变为 1 或反之亦然(对于每个周期)之前计算“计数”列的初始值和最终值之间的差异,并从这些值中创建一个新的 dataframe。

The output for this example would be:此示例的 output 将是:

 Cycle Difference 1 3 2 -6 3 2

You can use a GroupBy.agg on the groups formed of the consecutive values, then get the first minus last value (see below for variants):您可以在由连续值组成的组上使用GroupBy.agg ,然后获取第一个减去最后一个值(请参阅下面的变体):

 out = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum()) ['Count'].agg(lambda x: x.iloc[-1]-x.iloc[0]) )

output: output:

 Status 1 3 2 -6 3 2 4 0 Name: Count, dtype: int64

If you only want to do this for groups of more than one element:如果您只想对包含多个元素的组执行此操作:

 out = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum()) ['Count'].agg(lambda x: x.iloc[-1]-x.iloc[0] if len(x)>1 else pd.NA).dropna() )

output: output:

 Status 1 3 2 -6 3 2 Name: Count, dtype: object

output as DataFrame: output 为 DataFrame:

add .rename_axis('Cycle').reset_index(name='Difference') :添加.rename_axis('Cycle').reset_index(name='Difference')

 out = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum()) ['Count'].agg(lambda x: x.iloc[-1]-x.iloc[0] if len(x)>1 else pd.NA).dropna().rename_axis('Cycle').reset_index(name='Difference') )

output: output:

 Cycle Difference 0 1 3 1 2 -6 2 3 2

Use GroupBy.agg by consecutive groups created by comapre shifted values with cumulative sum, last subtract last and first value:使用GroupBy.agg通过由具有累积和的comapre移位值创建的连续组,最后减去最后一个和第一个值:

 df = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum().rename('Cycle'))['Count'].agg(['first','last']).eval('last - first').reset_index(name='Difference')) print (df) Cycle Difference 0 1 3 1 2 -6 2 3 2 3 4 0

If need filter out groups rows with 1 row is possible add aggregation GroupBy.size and then filter oupt rows by DataFrame.loc :如果需要过滤掉具有 1 行的行,可以添加聚合GroupBy.size ,然后按DataFrame.loc过滤输出行:

 df = (df.groupby(df['Status'].ne(df['Status'].shift()).cumsum().rename('Cycle'))['Count'].agg(['first','last', 'size']).loc[lambda x: x['size'] > 1].eval('last - first').reset_index(name='Difference')) print (df) Cycle Difference 0 1 3 1 2 -6 2 3 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM