简体   繁体   中英

DF Rolling calculation with starting value and based on another column

Lets say I have a dataframe where I have banking information. I have present value, a list of transactions, and I want to work backwards to calculate the balance over time.

Here is the dataframe:


value   CN_running_balance
2020-08-07  -50.82  843.70
2020-08-06  893.77  NaN
2020-08-05  0.00    NaN
2020-08-04  -9.56   NaN
2020-08-03  -12.21  NaN
... ... ...
2020-05-14  1224.78 NaN
2020-05-13  0.00    NaN
2020-05-12  0.00    NaN
2020-05-11  -25.00  NaN
2020-05-10  -0.00   NaN

And I want to transform the running balance to use that rows value column for next row, by subtracting the value from the balance.

value   CN_running_balance
2020-08-07  -50.82  843.70
2020-08-06  893.77  894.52
2020-08-05  0.00    0.75
2020-08-04  -9.56   etc
2020-08-03  -12.21  etc
... ... ...
2020-05-14  1224.78 etc
2020-05-13  0.00    etc
2020-05-12  0.00    etc
2020-05-11  -25.00  etc
2020-05-10  -0.00   etc

This has been pretty tricky for me so I would appreciate any suggestions on how to solve the problem!

I would suggest simply iterating over your dataframe as you have to forward fill the value. Here is one approach:

from io import StringIO
import pandas as pd

# Create your data frame
input_string = """value  CN_running_balance
2020-08-07  -50.82  843.70
2020-08-06  893.77  NaN
2020-08-05  0.00  NaN
2020-08-04  -9.56  NaN
2020-08-03  -12.21  NaN
2020-05-14  1224.78  NaN
2020-05-13  0.00  NaN
2020-05-12  0.00  NaN
2020-05-11  -25.00  NaN
2020-05-10  -0.00  NaN"""

data = StringIO(input_string)
df = pd.read_csv(data, sep='  ', engine='python')

# Create iterable index
df = df.reset_index()

# Forward fill running balance
for i in range(1, len(df)):
    df.loc[i, 'CN_running_balance']  = df.loc[i-1, 'CN_running_balance'] - df.loc[i-1, 'value']

# Reset original index
df = df.set_index('index')       
print(df)

Output:

              value  CN_running_balance
index                                  
2020-08-07   -50.82              843.70
2020-08-06   893.77              894.52
2020-08-05     0.00                0.75
2020-08-04    -9.56                0.75
2020-08-03   -12.21               10.31
2020-05-14  1224.78               22.52
2020-05-13     0.00            -1202.26
2020-05-12     0.00            -1202.26
2020-05-11   -25.00            -1202.26
2020-05-10    -0.00            -1177.26

Step 1: Calculate the delta between the current and previous value

delta = value - CN_running_balance[,2]
df$CN_running_balance[1] = value[1] # set first row in running balance to be equal to first CN row 
for (i in 2:nrow(df)) {     # for every other row 
    df$CN_running_balance[i] = df$CN_running_balance[i-1] + delta[i]   # use previous value plus the difference of current and prev values  
}

Step 2: Adjust the final CN running balance column by subtracting the initial balance from that row

df$CN_running_balance = df$CN_running_balance - df$CN_running_balance[1]  # take the first row and make all other rows have their running balance minus the first one. 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM