简体   繁体   中英

Dataframe recursively calculate values with conditions [complicated logic]

It seems like I cannot solve this on my own even though I already tried hard so I would like to seek your support in this matter. Thanks and appreciated in advance for your help.

Input Pandas dataframe:

day   a     
1     1000  
2     0     
3     0     
4     -1200 
5     0     
6     0     
7     -50   
8     0     
9     0     
10    0     
11    -150  

Output:

day   a     b    c   d
1     1000  1000 100 100
2     0     1000 100 200
3     0     1000 100 300
4     -1200 100  10  10
5     0     100  10  20
6     0     100  10  30
7     -50   50   5   35
8     0     50   5   40
9     0     50   5   45
10    0     50   5   50
11    -100  -50  0   0 

Explain:

  • a is daily amount.
  • b is accumulated sum of a, but with a condition, explained below.
  • c = b * 10%
  • d = accumulated sum of c

The problem here is the logic for column B. At the day when there is a minus amount in column a:

  • if a + b(of previous day) + d(of previous day) is > 0, then b = the result, d = c (previous d already been added to b, so remove previous d); eg line day = 4

  • if a + b(of previous day) > 0, then b = previous b + a; d = previous d + c; eg line day = 7

  • if a + b(of previous day) + d(of previous day) <= 0, then c, d turn to 0; eg line day = 11

I have been stuck for days so your help is really appreciated. If there is any questions please let me know.

This is way too complex to be achieved with vectorized methods. So IMHO, the best way is to forget Pandas and simply process arrays. As in the end we use the results to feed the dataframe, I would use numpy arrays instead of plain lists. Code could be:

# prepare the numpy arrays for the existing column and the new ones
A = df['a'].to_numpy()
B = np.ndarray(A.shape)
C = np.ndarray(A.shape)
D = np.ndarray(A.shape)

# initialize initial values of b, c and d to 0
b = c = d = 0

# loop over A and compute b, c, and d according to the requirements
for i, a in enumerate(A):
    if a + b > 0:
        b += a
        c = b // 10
        d += c
    elif a + b + d > 0:
        b += a + d
        c = b // 10
        d = c
    else:
        b += a
        c = d = 0
    # feed the arrays
    B[i], C[i], D[i] = b, c, d

# add the new columns to the DataFrame
df['B'] = B
df['C'] = C
df['D'] = D

It gives as expected:

    day     a       B      C      D
0     1  1000  1000.0  100.0  100.0
1     2     0  1000.0  100.0  200.0
2     3     0  1000.0  100.0  300.0
3     4 -1200   100.0   10.0   10.0
4     5     0   100.0   10.0   20.0
5     6     0   100.0   10.0   30.0
6     7   -50    50.0    5.0   35.0
7     8     0    50.0    5.0   40.0
8     9     0    50.0    5.0   45.0
9    10     0    50.0    5.0   50.0
10   11  -100   -50.0    0.0    0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM