Update pandas dataframe current row attribute based on its value in the previous row for each row

Question

I need to perform the following steps on a data-frame:

Assign a starting value to the "balance" attribute of the first row.
Calculate the "balance" values for the subsequent rows based on value of the previous row using the formula for eg : (previous row balance + 1)

I have tried the following steps:

Created the data-frame:

df = pd.DataFrame(pd.date_range(start = '2019-01-01', end = '2019-12-31'),columns = ['dt_id'])

Created attribute called 'balance':

df["balance"] = 0

Tried to conditionally update the data-frame:

df["balance"] = np.where(df.index == 0, 100, df["balance"].shift(1) + 1)

Results:

From what I can observe, the value is being retrieved for subsequent update before it can be updated in the original data-frame.

The desired output for "balance" attribute :

Row 0 : 100
Row 1: 101
Row 2 : 102

And so on

Answer 1

If I understand correctly if you add this line of code after yours, you are ready:

df["balance"].cumsum()

0      100.0
1      101.0
2      102.0
3      103.0
4      104.0
       ...  
360    460.0
361    461.0
362    462.0
363    463.0
364    464.0

It is a cumulative sum, it sums its value with the previous one and since you have the starting value and then ones it will do what you want.

Answer 2

The problem you have is, that you want to calculate an array and the elements are dependent on each other. So, eg, element 2 depends on elemen 1 in your array. Element 3 depends on element 2, and so on.

If there is a simple solution, depends on the formula you use, ie, if you can vectorize it. Here is a good explanation on that topic: Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?

In your case a simple loop should do it:

balance = np.empty(len(df.index))
balance[0] = 100
for i in range(1, len(df.index)):
  balance[i] = balance[i-1] + 1  # or whatever formula you want to use

Please note, that above is the general solution. Your formula can be vectorized, thus also be generated using:

balance = 100 + np.arange(0, len(df.index))

Update pandas dataframe current row attribute based on its value in the previous row for each row

Question

2 answers

solution1
1 2019-09-12 07:37:50

solution2
1 ACCPTED 2019-09-12 08:10:59

Update pandas dataframe current row attribute based on its value in the previous row for each row

Question

2 answers

solution1 1 2019-09-12 07:37:50

solution2 1 ACCPTED 2019-09-12 08:10:59

solution1
1 2019-09-12 07:37:50

solution2
1 ACCPTED 2019-09-12 08:10:59