简体   繁体   中英

How to use the previous value from calculated new pandas column based on conditions?

import pandas as pd
import numpy as np

df = pd.DataFrame(
    (
        [6, 5, 10],
        [12, 6, 11],
        [7, 6, 10],
        [7, 5, 11],
        [4, 5, 10],
        [6, 5, 10],
        [7, 4, 9],
    ),
    columns=[
        "val", "lower", "upper"
    ]
)

# define conditions
conditions = [df['val'] > df['upper'],
              df['val'] < df['lower']]

# define choices
choices = [1, -1]

# create new column in DataFrame that displays results of comparisons
df['cond'] = np.select(conditions, choices, default=0)

print(df)

The result of the above is now this:

    val   lower upper  cond
0    6      5     10     0
1   12      6     11     1
2    7      6     10     0
3    7      5     11     0
4    4      5     10    -1
5    6      5     10     0
6    7      4      9     0

What I want to achieve is the following:

  • row[0].cond should have value NaN because I don't know the last cross was at the upper or lower
  • row[1] has the 'val' crossed the upper that result sin cond = 1
  • row[2] is between the upper and lower so no cross in upper or lower 'cond' should have the prev 'cond' value from row[1], so cond = 1
  • row[3] is between the upper and lower so no cross in upper or lower 'cond' should have the prev 'cond' value from row[2], so cond = 1
  • row[4] has the 'val' crossed the lower that results in cond = -1
  • row[5] is between the upper and lower so no cross in upper or lower 'cond' should have the prev 'cond' value from row[4], so cond = -1
  • row[6] is between the upper and lower so no cross in upper or lower 'cond' should have the prev 'cond' value from row[6], so cond = -1

The following is not working

df['cond'] = np.select(conditions, choices, default=df["cond"].shift(1))

So the result should be:

    val   lower upper  cond
0    6      5     10     NaN
1   12      6     11     1
2    7      6     10     1
3    7      5     11     1
4    4      5     10    -1
5    6      5     10    -1
6    7      4      9    -1

What is the easiest way to get this done???

IIUC, you can try to replace the zero by the previous non zero value and replace the left zero (always the first) with NaN

df['cond'] = np.select(conditions, choices, default=0)

df['cond'] = df['cond'].replace(to_replace=0, method='ffill').replace(0, np.nan)
print(df)

   val  lower  upper  cond
0    6      5     10   NaN
1   12      6     11     1
2    7      6     10     1
3    7      5     11     1
4    4      5     10    -1
5    6      5     10    -1
6    7      4      9    -1

As mozway suggests, rather than set 0 as default value in np.select , you can use NaN directly

df['cond'] = np.select(conditions, choices, default=np.nan)

df['cond'] = df['cond'].ffill()

# or in one line
# np.select returns an array,
# here we use pd.Series to chain ffill method
df['cond'] = pd.Series(np.select(conditions, choices, default=np.nan), index=df.index).ffill()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM