Conditional Cumulative Sum of Multiple Rows in Dataframe

Question

I am trying to find the cumulative sum for four consecutive rows in a dataframe based on a condition.

The new column ( 'veh_time_TOT' ) is a sum of four consecutive ' veh_time(s) ' values and the condition is ' Day_type ': Weekend or Weekday.

Here is how the data is now set up:

    veh-time(s) distance(m) Day_type
0   72  379.0   Weekday
1   70  379.0   Weekday
2   50  379.0   Weekday
3   60  379.0   Weekday
4   70  379.0   Weekday
5   65  379.0   Weekday
6   30  379.0   Weekend
7   35  379.0   Weekend
8   30  379.0   Weekend
9   30  379.0   Weekend
10  20  379.0   Weekend

Here is the desired output:

    veh-time(s) distance(m) Day_type    veh_time_TOT
0   72  379.0   Weekday        0
1   70  379.0   Weekday        0
2   50  379.0   Weekday        0
3   60  379.0   Weekday        252
4   70  379.0   Weekday        250
5   65  379.0   Weekday        245
6   30  379.0   Weekend        0
7   35  379.0   Weekend        0
8   30  379.0   Weekend        0
9   30  379.0   Weekend        125
10  20  379.0   Weekend        115

I've tried several things but the only thing I could find is using the .cumsum function which only finds the sum for 2 consecutive rows. The zeros in the " veh_time_TOT " are there because there haven't been 4 rows yet to make up the sum.

My thinking that this would be a combination of .cumsum and conditional if statement that goes on a loop.

What do you guys think? Any help is appreciated.

Answer 1

Here are the steps I took to get the desired column:

First, I set up your example DataFrame.
Next, I defined the three columns of interest (the column whose values will be the basis of the calculation, the column used for comparison, and the column name for the calculated quantity.
After that, I find all the rows that are eligible for this calculation (previous 4 rows have the same value for col_compare ).
I then iterate over this slice of the original DataFrame, summing the previous four values of col_val .
Lastly, I create the new column with the desired name of col_name_new
- Initialize its values to zero
- Fill the eligible locations with the list we generated in the previous step:

Here is my code, feel free to ask Q's in the comments!

import pandas as pd

# Setup

cols = ['veh-time(s)', 'distance(m)', 'Day_type']

data= [[72,  379.0 ,  'Weekday'],
       [70,  379.0 ,  'Weekday'],
       [50,  379.0 ,  'Weekday'],
       [60,  379.0 ,  'Weekday'],
       [70,  379.0 ,  'Weekday'],
       [65,  379.0 ,  'Weekday'],
       [30,  379.0 ,  'Weekend'],
       [35,  379.0 ,  'Weekend'],
       [30,  379.0 ,  'Weekend'],
       [30,  379.0 ,  'Weekend'],
       [20,  379.0 ,  'Weekend']]


df = pd.DataFrame(data,columns=cols )

# Define columns for potential future generalization

col_val='veh-time(s)'
col_compare='Day_type'
col_name_new = 'veh_time_TOT'

# DataFrame slice of rows eligible for calculation

cut_prev_four =  (df[col_compare].shift(1)==df[col_compare]) \
                &(df[col_compare].shift(2)==df[col_compare].shift(1)) \
                &(df[col_compare].shift(3)==df[col_compare].shift(2))

df_consecutive = df[cut_prev_four]

# Perform calculation on eligible rows. Store in list

prev_four_list = []
for i,row in df_consecutive.iterrows():
    prev_four_vals = df.iloc[i-3:i+1][col_val].values
    print(i, prev_four_vals, sum(prev_four_vals) )
    prev_four_list.append(sum(prev_four_vals))

# Set new column to the calculated values

df[col_name_new] = 0
df.loc[cut_prev_four, col_name_new] = prev_four_list

Conditional Cumulative Sum of Multiple Rows in Dataframe

Question

1 answers

solution1
0 2018-11-03 21:17:10

Conditional Cumulative Sum of Multiple Rows in Dataframe

Question

1 answers

solution1 0 2018-11-03 21:17:10

solution1
0 2018-11-03 21:17:10