My dataframe looks like this:
scale cons hold supply add.supply s_res z_res
48 -5 NaN NaN NaN NaN NaN NaN
49 -4 NaN NaN NaN NaN NaN NaN
50 -3 NaN NaN NaN NaN NaN NaN
51 -2 NaN NaN NaN NaN NaN NaN
52 -1 NaN NaN NaN NaN NaN NaN
53 0 0 300 0 NaN 100 200
54 1 20 NaN 0 NaN 200 322
55 2 30 NaN 70 NaN 100 100
56 3 25 NaN 0 NaN 400 110
57 4 15 NaN 0 NaN 100 300
58 5 10 NaN 0 NaN 100 180
59 6 40 NaN 0 NaN 100 100
...
I need to do the following:
Starting with the value where scale = 1
fill the column hold
with values calculated as follows:
I take the previous value in the column hold
and subtract from it the corresponding value of the current cell from the column cons
and add the corresponding value from the column supply
.
(For a cell in a column hold
that corresponds to scale = 1
it will be (300 - 20) + 0 = 280
, for the next cell (280 - 30) + 70) = 320
, for the next cell (320 - 25) + 0) = 295
and so on)
If the value in the column hold
is less than the corresponding value in the column s_res
, then to the next cell I must add the difference between the corresponding next cell values in the columns s_res
and z_res
.
For example, the value in the column hold
is 295
, where scale = 3
. This value is less than the value in the column s_res = 400
. Then the next value I need to count so: (295 - 15) + 0 + (300 - 100) = 480
. And write this difference between s_res
and z_res
in the column add.supply
.
I need every new calculated value in the column hold
check whether it is less than the value in the column s_res
.
The result should look like this:
scale cons hold supply add.supply s_res z_res
48 -5 NaN NaN NaN NaN NaN NaN
49 -4 NaN NaN NaN NaN NaN NaN
50 -3 NaN NaN NaN NaN NaN NaN
51 -2 NaN NaN NaN NaN NaN NaN
52 -1 NaN NaN NaN NaN NaN NaN
53 0 0 300 0 NaN 100 200
54 1 20 280 0 NaN 200 322
55 2 30 320 70 NaN 100 100
56 3 25 295 0 NaN 400 110
57 4 15 480 0 200 100 300
58 5 10 470 0 NaN 100 180
59 6 40 430 0 NaN 100 100
...
I would be grateful for any advice.
UPD I tried to apply the code
df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()
to a larger dataframe and I'm having problems
My new dataframe
scale cons hold supply add.supply s_res z_res
0 0 0 300 0 NaN 100 200
1 1 20 NaN 0 NaN 200 322
2 2 30 NaN 70 NaN 100 100
3 3 25 NaN 0 NaN 400 110
4 4 15 NaN 0 NaN 100 300
5 5 10 NaN 0 NaN 100 180
6 6 40 NaN 0 NaN 100 100
7 7 60 NaN 0 NaN 300 400
8 8 50 NaN 0 NaN 245 300
9 9 70 NaN 0 NaN 300 600
10 10 50 NaN 0 NaN 143 228
...
The result should be the following:
scale cons hold supply add.supply s_res z_res
0 0 0 300 0 NaN 100 200
1 1 20 280 0 NaN 200 322
2 2 30 320 70 NaN 100 100
3 3 25 295 0 NaN 400 110
4 4 15 480 0 200 100 300
5 5 10 470 0 NaN 100 180
6 6 40 430 0 NaN 100 100
7 7 60 370 0 NaN 300 400
8 8 50 320 0 NaN 245 300
9 9 70 250 0 NaN 300 600
10 10 50 285 0 85 143 228
...
But the result of the code execution was not what it should be:
scale cons hold supply add.supply s_res z_res
0 0 0 300 0 NaN 100 200
1 1 20 280 0 NaN 200 322
2 2 30 320 70 NaN 100 100
3 3 25 295 0 NaN 400 110
4 4 15 480 0 200 100 300
5 5 10 470 0 NaN 100 180
6 6 40 430 0 NaN 100 100
7 7 60 370 0 NaN 300 400
8 8 50 375 0 55 245 300
9 9 70 605 0 300 300 600
10 10 50 640 0 85 143 228
...
Error appears after hold = 370
, but I don't understand why.
Instead of doing this row by row, you can use a combination of cumsum()
and np.where
to do this across the whole DataFrame:
df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
df['hold'] = df.hold + df['add.supply'].fillna(0).cumsum()
Think of the transformations you want to do in two stages. You have an initial stage where you're adding and subtracting from an initial value of df.hold
. Then you're altering that new value of hold in some cases, according to some conditions.
cumsum()
takes a Series or DataFrame and makes a new version where each row is the cumulative sum of the previous rows and the current row. You can do that for df.cons
and df.supply
to get the cumulative amounts that will be subtracted from and added to df.hold
. Now you have the first stage of df.hold
calculated.
You can use np.where
to find out when df.hold
meets the conditions you're interested in. Where it does, you can set df['add.supply']
accordingly. Then you can add this new column to df.hold
. Note that we're using fillna(0)
to make sure each row has a value, and cumsum()
again to preserve the added conditional values over time.
UPDATE
The original code above didn't work after the addition of one value of add.supply
, because future values of the first stage of df.hold
didn't include it yet. There may be a way to do this non-iteratively, and there's certainly a better and cleaner way than what I've done below, but this at least will get the job done:
df['hold'] = df.hold.fillna(method='ffill') - df.cons.cumsum() + df.supply.cumsum()
hold = df.hold.tolist()
s_res = df.s_res.tolist()
add = (df.z_res - df.s_res).shift(-1).tolist()
newh = [hold[0]]
totala = 0
for h, s, a in zip(hold, s_res, add):
newh.append(h + totala)
if newh[-1] < s:
totala += a
df['hold'] = pd.Series(newh[1:])
df['add.supply'] = np.where(df.hold.shift() < df.s_res.shift(), df.z_res - df.s_res, np.nan)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.