I am trying to find the cumulative sum for four consecutive rows in a dataframe based on a condition.
The new column ( 'veh_time_TOT' ) is a sum of four consecutive ' veh_time(s) ' values and the condition is ' Day_type ': Weekend or Weekday.
Here is how the data is now set up:
veh-time(s) distance(m) Day_type
0 72 379.0 Weekday
1 70 379.0 Weekday
2 50 379.0 Weekday
3 60 379.0 Weekday
4 70 379.0 Weekday
5 65 379.0 Weekday
6 30 379.0 Weekend
7 35 379.0 Weekend
8 30 379.0 Weekend
9 30 379.0 Weekend
10 20 379.0 Weekend
Here is the desired output:
veh-time(s) distance(m) Day_type veh_time_TOT
0 72 379.0 Weekday 0
1 70 379.0 Weekday 0
2 50 379.0 Weekday 0
3 60 379.0 Weekday 252
4 70 379.0 Weekday 250
5 65 379.0 Weekday 245
6 30 379.0 Weekend 0
7 35 379.0 Weekend 0
8 30 379.0 Weekend 0
9 30 379.0 Weekend 125
10 20 379.0 Weekend 115
I've tried several things but the only thing I could find is using the .cumsum function which only finds the sum for 2 consecutive rows. The zeros in the " veh_time_TOT " are there because there haven't been 4 rows yet to make up the sum.
My thinking that this would be a combination of .cumsum and conditional if statement that goes on a loop.
What do you guys think? Any help is appreciated.
Here are the steps I took to get the desired column:
First, I set up your example DataFrame.
Next, I defined the three columns of interest (the column whose values will be the basis of the calculation, the column used for comparison, and the column name for the calculated quantity.
col_compare
). I then iterate over this slice of the original DataFrame, summing the previous four values of col_val
.
Lastly, I create the new column with the desired name of col_name_new
Here is my code, feel free to ask Q's in the comments!
import pandas as pd
# Setup
cols = ['veh-time(s)', 'distance(m)', 'Day_type']
data= [[72, 379.0 , 'Weekday'],
[70, 379.0 , 'Weekday'],
[50, 379.0 , 'Weekday'],
[60, 379.0 , 'Weekday'],
[70, 379.0 , 'Weekday'],
[65, 379.0 , 'Weekday'],
[30, 379.0 , 'Weekend'],
[35, 379.0 , 'Weekend'],
[30, 379.0 , 'Weekend'],
[30, 379.0 , 'Weekend'],
[20, 379.0 , 'Weekend']]
df = pd.DataFrame(data,columns=cols )
# Define columns for potential future generalization
col_val='veh-time(s)'
col_compare='Day_type'
col_name_new = 'veh_time_TOT'
# DataFrame slice of rows eligible for calculation
cut_prev_four = (df[col_compare].shift(1)==df[col_compare]) \
&(df[col_compare].shift(2)==df[col_compare].shift(1)) \
&(df[col_compare].shift(3)==df[col_compare].shift(2))
df_consecutive = df[cut_prev_four]
# Perform calculation on eligible rows. Store in list
prev_four_list = []
for i,row in df_consecutive.iterrows():
prev_four_vals = df.iloc[i-3:i+1][col_val].values
print(i, prev_four_vals, sum(prev_four_vals) )
prev_four_list.append(sum(prev_four_vals))
# Set new column to the calculated values
df[col_name_new] = 0
df.loc[cut_prev_four, col_name_new] = prev_four_list
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.