简体   繁体   中英

How can I create a new column on a Pandas Dataframe with conditions based on previous and next row?

I am trying to create a new Pandas column that indicates whether a number is the largest compared to the row above and the row below.

The column is going to be binary, where '1' indicates it is the highest number compared to last and next row, and '0' will indicate the condition is not satisfied.

This is how the data looks:

           Date      High
    0   2015-11-11  25.90
    1   2015-11-12  27.12
    2   2015-11-13  26.20
    3   2015-11-16  26.19
    4   2015-11-17  25.51
    5   2015-11-18  26.31
    6   2015-11-19  26.00
    7   2015-11-20  27.01
    8   2015-11-23  25.60
    9   2015-11-24  27.00
    10  2015-11-25  26.49

This is my desired outcome:

    Date        High    higher
0   2015-11-11  25.90   0.0
1   2015-11-12  27.12   1.0
2   2015-11-13  26.20   0.0
3   2015-11-16  26.19   0.0
4   2015-11-17  25.51   0.0
5   2015-11-18  26.31   1.0
6   2015-11-19  26.00   0.0
7   2015-11-20  27.01   1.0
8   2015-11-23  25.60   0.0
9   2015-11-24  27.00   1.0
10  2015-11-25  26.49   0.0

For further illustration, here is what I did on Microsoft Excel, which is exactly what I am trying to achieve with Python: ExcelExample

First Attempt:

for i in dftest['High']:
    if dftest['High'][i] > dftest['High'][i-1] and dftest['High'][i] > dftest['High'][i+1]:
        dftest['higher'] = 1
    else:
        dftest['higher'] = 0

This results in the error: "TypeError: cannot do label indexing on class 'pandas.core.indexes.range.RangeIndex' with these indexers [25.9] of class 'float'"

Second Attempt: Here I try to make use of '.shift()'.

for i in dftest['High']:
    if dftest['High'] > dftest['High'].shift(1) and dftest['High'] > dftest['High'].shift(-1):
        dftest['higher'] = 1
    else:
        dftest['higher'] = 0

This results in error: "TypeError: unsupported operand type(s) for &: 'float' and 'float'"

Third Attempt (similar to the second attempt): Used '&' operator instead of 'and'.

Results in error: "TypeError: cannot do label indexing on class 'pandas.core.indexes.range.RangeIndex' with these indexers [25.9] of class 'float'"

Would appreciate any sort of help!

Edit: I would really appreciate if the solution provided could easily be alternated if I wanted to find the highest of 5 or 7 or 9 consecutive numbers. Thank you again!

You can use.pct_change() to see if values are increasing or decreasing, and then use np.where() to replace values on condition.

df = pd.read_clipboard()
df['higher'] = df['High'].pct_change()
df['higher'] = np.where(df['higher'] > 0, 1.0, 0.0)

print(df)


    Date        High    higher
0   2015-11-11  25.90   0.0
1   2015-11-12  27.12   1.0
2   2015-11-13  26.20   0.0
3   2015-11-16  26.19   0.0
4   2015-11-17  25.51   0.0
5   2015-11-18  26.31   1.0
6   2015-11-19  26.00   0.0
7   2015-11-20  27.01   1.0
8   2015-11-23  25.60   0.0
9   2015-11-24  27.00   1.0
10  2015-11-25  26.49   0.0

For your comment, you can do rolling max.

df['rolling_max'] = df['High'].rolling(5, min_periods=1).max()

print(df)

Date    High    higher  rolling_max
0   2015-11-11  25.90   0.0 25.90
1   2015-11-12  27.12   0.0 27.12
2   2015-11-13  26.20   0.0 27.12
3   2015-11-16  26.19   0.0 27.12
4   2015-11-17  25.51   0.0 27.12
5   2015-11-18  26.31   1.0 27.12
6   2015-11-19  26.00   0.0 26.31
7   2015-11-20  27.01   1.0 27.01
8   2015-11-23  25.60   0.0 27.01
9   2015-11-24  27.00   1.0 27.01
10  2015-11-25  26.49   1.0 27.01

This so called local max

from scipy.signal import argrelextrema
ary=argrelextrema(df.High.values,np.greater)
df['local max']=np.where(df.index.isin(ary[0]),1,0)
df
          Date   High  local max
0   2015-11-11  25.90          0
1   2015-11-12  27.12          1
2   2015-11-13  26.20          0
3   2015-11-16  26.19          0
4   2015-11-17  25.51          0
5   2015-11-18  26.31          1
6   2015-11-19  26.00          0
7   2015-11-20  27.01          1
8   2015-11-23  25.60          0
9   2015-11-24  27.00          1
10  2015-11-25  26.49          0

In the end, I decided to use this:

hf_three = []

try:
    for i in df.index:   
        if df.loc[i]['High']>df.loc[i+1]['High'] and df.loc[i]['High']>df.loc[i-1]['High']:
            hf_three.append(1)
        else:
            hf_three.append(0)
except ValueError:              # To handle cases where 'i-1' doesn't exist.
    hf_three.append(0)
except KeyError:                # To handle cases where 'i+1' doesn't exist
    hf_three.append(0)

df['higher'] = hf_three

Though this works, if I replace '>' for '<', it no longer works and will not even append. Solutions to this will be greatly appreciated!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM