简体   繁体   中英

Fill the column using the previous value in the column and some calculations in pandas

My dataframe looks like this:

   scale  cons    hold    supply   add.supply     s_res      z_res
48  -5     NaN    NaN      NaN       NaN           NaN        NaN   
49  -4     NaN    NaN      NaN       NaN           NaN        NaN   
50  -3     NaN    NaN      NaN       NaN           NaN        NaN   
51  -2     NaN    NaN      NaN       NaN           NaN        NaN   
52  -1     NaN    NaN      NaN       NaN           NaN        NaN   
53   0      0     300       0        NaN           100        200   
54   1     20     NaN       0        NaN           200        322   
55   2     30     NaN      70        NaN           100        100   
56   3     25     NaN       0        NaN           400        110   
57   4     15     NaN       0        NaN           100        300   
58   5     10     NaN       0        NaN           100        180   
59   6     40     NaN       0        NaN           100        100   
...

I need to do the following:

Starting with the value where scale = 1 fill the column hold with values calculated as follows:

I take the previous value in the column hold and subtract from it the corresponding value of the current cell from the column cons and add the corresponding value from the column supply .

(For a cell in a column hold that corresponds to scale = 1 it will be (300 - 20) + 0 = 280 , for the next cell (280 - 30) + 70) = 320 , for the next cell (320 - 25) + 0) = 295 and so on)

If the value in the column hold is less than the corresponding value in the column s_res , then to the next cell I must add the difference between the corresponding next cell values in the columns s_res and z_res .

For example, the value in the column hold is 295 , where scale = 3 . This value is less than the value in the column s_res = 400 . Then the next value I need to count so: (295 - 15) + 0 + (300 - 100) = 480 . And write this difference between s_res and z_res in the column add.supply .

I need every new calculated value in the column hold check whether it is less than the value in the column s_res .

The result should look like this:

   scale  cons    hold    supply   add.supply     s_res      z_res
48  -5     NaN    NaN      NaN       NaN           NaN        NaN   
49  -4     NaN    NaN      NaN       NaN           NaN        NaN   
50  -3     NaN    NaN      NaN       NaN           NaN        NaN   
51  -2     NaN    NaN      NaN       NaN           NaN        NaN   
52  -1     NaN    NaN      NaN       NaN           NaN        NaN   
53   0      0     300       0        NaN           100        200   
54   1     20     280       0        NaN           200        322   
55   2     30     320      70        NaN           100        100   
56   3     25     295       0        NaN           400        110   
57   4     15     480       0        200           100        300   
58   5     10     470       0        NaN           100        180   
59   6     40     430       0        NaN           100        100   
...

I would be grateful for any advice.

UPD I tried to apply the code

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()

to a larger dataframe and I'm having problems

My new dataframe

   scale   cons   hold  supply  add.supply   s_res   z_res
 0   0       0    300     0        NaN        100     200
 1   1      20    NaN     0        NaN        200     322
 2   2      30    NaN    70        NaN        100     100
 3   3      25    NaN     0        NaN        400     110
 4   4      15    NaN     0        NaN        100     300
 5   5      10    NaN     0        NaN        100     180
 6   6      40    NaN     0        NaN        100     100
 7   7      60    NaN     0        NaN        300     400
 8   8      50    NaN     0        NaN        245     300
 9   9      70    NaN     0        NaN        300     600
10  10      50    NaN     0        NaN        143     228
...

The result should be the following:

   scale   cons   hold  supply  add.supply   s_res   z_res
 0   0       0    300     0        NaN        100     200
 1   1      20    280     0        NaN        200     322
 2   2      30    320    70        NaN        100     100
 3   3      25    295     0        NaN        400     110
 4   4      15    480     0        200        100     300
 5   5      10    470     0        NaN        100     180
 6   6      40    430     0        NaN        100     100
 7   7      60    370     0        NaN        300     400
 8   8      50    320     0        NaN        245     300
 9   9      70    250     0        NaN        300     600
10  10      50    285     0         85        143     228
...

But the result of the code execution was not what it should be:

   scale   cons   hold  supply  add.supply   s_res   z_res
 0   0       0    300     0        NaN        100     200
 1   1      20    280     0        NaN        200     322
 2   2      30    320    70        NaN        100     100
 3   3      25    295     0        NaN        400     110
 4   4      15    480     0        200        100     300
 5   5      10    470     0        NaN        100     180
 6   6      40    430     0        NaN        100     100
 7   7      60    370     0        NaN        300     400
 8   8      50    375     0         55        245     300
 9   9      70    605     0        300        300     600
10  10      50    640     0         85        143     228
...

Error appears after hold = 370 , but I don't understand why.

Instead of doing this row by row, you can use a combination of cumsum() and np.where to do this across the whole DataFrame:

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()

Think of the transformations you want to do in two stages. You have an initial stage where you're adding and subtracting from an initial value of df.hold . Then you're altering that new value of hold in some cases, according to some conditions.

cumsum() takes a Series or DataFrame and makes a new version where each row is the cumulative sum of the previous rows and the current row. You can do that for df.cons and df.supply to get the cumulative amounts that will be subtracted from and added to df.hold . Now you have the first stage of df.hold calculated.

You can use np.where to find out when df.hold meets the conditions you're interested in. Where it does, you can set df['add.supply'] accordingly. Then you can add this new column to df.hold . Note that we're using fillna(0) to make sure each row has a value, and cumsum() again to preserve the added conditional values over time.

UPDATE

The original code above didn't work after the addition of one value of add.supply , because future values of the first stage of df.hold didn't include it yet. There may be a way to do this non-iteratively, and there's certainly a better and cleaner way than what I've done below, but this at least will get the job done:

df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()

hold = df.hold.tolist()
s_res = df.s_res.tolist()
add = (df.z_res - df.s_res).shift(-1).tolist()

newh = [hold[0]]
totala = 0
for h, s, a in zip(hold, s_res, add):
    newh.append(h + totala)
    if newh[-1] < s:
        totala += a

df['hold'] = pd.Series(newh[1:])
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM