简体   繁体   中英

Update pandas dataframe current row attribute based on its value in the previous row for each row

I need to perform the following steps on a data-frame:

  1. Assign a starting value to the "balance" attribute of the first row.
  2. Calculate the "balance" values for the subsequent rows based on value of the previous row using the formula for eg : (previous row balance + 1)

I have tried the following steps:

Created the data-frame:

df = pd.DataFrame(pd.date_range(start = '2019-01-01', end = '2019-12-31'),columns = ['dt_id'])

Created attribute called 'balance':

df["balance"] = 0

Tried to conditionally update the data-frame:

df["balance"] = np.where(df.index == 0, 100, df["balance"].shift(1) + 1)

Results: 在此处输入图片说明

From what I can observe, the value is being retrieved for subsequent update before it can be updated in the original data-frame.

The desired output for "balance" attribute :

  • Row 0 : 100

  • Row 1: 101

  • Row 2 : 102

And so on

If I understand correctly if you add this line of code after yours, you are ready:

df["balance"].cumsum()

0      100.0
1      101.0
2      102.0
3      103.0
4      104.0
       ...  
360    460.0
361    461.0
362    462.0
363    463.0
364    464.0

It is a cumulative sum, it sums its value with the previous one and since you have the starting value and then ones it will do what you want.

The problem you have is, that you want to calculate an array and the elements are dependent on each other. So, eg, element 2 depends on elemen 1 in your array. Element 3 depends on element 2, and so on.

If there is a simple solution, depends on the formula you use, ie, if you can vectorize it. Here is a good explanation on that topic: Is it possible to vectorize recursive calculation of a NumPy array where each element depends on the previous one?

In your case a simple loop should do it:

balance = np.empty(len(df.index))
balance[0] = 100
for i in range(1, len(df.index)):
  balance[i] = balance[i-1] + 1  # or whatever formula you want to use

Please note, that above is the general solution. Your formula can be vectorized, thus also be generated using:

balance = 100 + np.arange(0, len(df.index))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM