I have a data frame with 2 columns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=list('AB'))
A B
0 11 10
1 61 30
2 24 54
3 47 52
4 72 42
... ... ...
95 61 2
96 67 41
97 95 30
98 29 66
99 49 22
100 rows × 2 columns
Now I want to create a third column, which is a rolling window max of col 'A' BUT the max has to be lower than the corresponding value in col 'B'. In other words I want the value of the 4 (using a window size of 4) in column 'A' closest to the value in col 'B', yet smaller than B
So for example in row 3 47 52 the new value I am looking for, is not 61 but 47, because it is the highest value of the 4 that is not higher than 52
pseudo code
df['C'] = df['A'].rolling(window=4).max() where < df['B']
You can use concat
+ shift
to create a wide DataFrame with the previous values, which makes complicated rolling calculations a bit easier.
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 2)), columns=list('AB'))
N = 4
# End slice ensures same default min_periods behavior to `.rolling`
df1 = pd.concat([df['A'].shift(i).rename(i) for i in range(N)], axis=1).iloc[N-1:]
# Remove values larger than B, then find the max of remaining.
df['C'] = df1.where(df1.lt(df.B, axis=0)).max(1)
print(df.head(15))
A B C
0 51 92 NaN # Missing b/c min_periods
1 14 71 NaN # Missing b/c min_periods
2 60 20 NaN # Missing b/c min_periods
3 82 86 82.0
4 74 74 60.0
5 87 99 87.0
6 23 2 NaN # Missing b/c 82, 74, 87, 23 all > 2
7 21 52 23.0 # Max of 21, 23, 87, 74 which is < 52
8 1 87 23.0
9 29 37 29.0
10 1 63 29.0
11 59 20 1.0
12 32 75 59.0
13 57 21 1.0
14 88 48 32.0
You can use a custom function to .apply
to the rolling window. In this case, you can use a default argument to pass in the B
column.
df = pd.DataFrame(np.random.randint(0,100,size=(100, 2)), columns=('AB'))
def rollup(a, B=df.B):
ix = a.index.max()
b = B[ix]
return a[a<b].max()
df['C'] = df.A.rolling(4).apply(rollup)
df
# returns:
A B C
0 8 17 NaN
1 23 84 NaN
2 75 84 NaN
3 86 24 23.0
4 52 83 75.0
.. .. .. ...
95 38 22 NaN
96 53 48 38.0
97 45 4 NaN
98 3 92 53.0
99 91 86 53.0
The NaN
values occur when no number in the window of A is less than B or at the start of the series when the window is too big for the first few rows.
You can use where
to replace values that don't fulfill the condition with np.nan
and then use rolling(window=4, min_periods=1)
:
In [37]: df['C'] = df['A'].where(df['A'] < df['B'], np.nan).rolling(window=4, min_periods=1).max()
In [38]: df
Out[38]:
A B C
0 0 1 0.0
1 1 2 1.0
2 2 3 2.0
3 10 4 2.0
4 4 5 4.0
5 5 6 5.0
6 10 7 5.0
7 10 8 5.0
8 10 9 5.0
9 10 10 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.