简体   繁体   中英

Pandas - variable rolling window

I'm looking to create an iterative rolling process to be used on a pandas DataFrame that stops when certain criteria are met. Specifically, I want the function to check the sum of values over the window and stop when the absolute value of the sum exceeds some amount.

x = np.random.randint(0,5,(100,))
df = pd.DataFrame(x, columns=["value"])
df_iter = pd.DataFrame(index=df.index)
max_iter = 5
threshold = 10

for i in range(2,max_iter+1):
    df_iter[i] = df["value"].rolling(i).sum()

match_indices = np.argmax(df_iter.abs().values>threshold, axis=1)

The above sort of gets there but is a little clunky and would need more to account for items where the threshold was not met.

Ultimately, I'm looking to get something that would just be a series of [-1,0,1] where each item would be 1 if the positive threshold is exceeded in the max window, -1 if the negative threshold is exceeded, or 0 otherwise. So the output would be something like this below. Note that the items tend to occur in clusters due to the rolling nature. Again the most important feature of this is to find the most recent occurrence of the threshold being exceeded.

[0,1,1,1,0,0,-1,-1,-1,0,-1,-1,-1,-1,0,0,0,1,1,1,1]

So is there a way to do a rolling find in pandas?

Turns out this is fairly easy with numpy's cumsum function.

data = np.random.randint(-10,10,(100,))
df = pd.DataFrame(data, columns=["value"])
max_n = 10
threshold = 10

def get_last_threshold(x):
    # reverse indexes and find cumulative sum for last max_n periods
    x = x.values[::-1].cumsum()
    # find first instance of absolute value of the cumulative sum above the threshold
    match = np.argmax(np.abs(x)>threshold) 
    # map to [-1,0,1] by getting sign of matching cumsums and filtering out items below threshold (np.argmax defaults to index 0 if no match found)
    signal = np.sign(x[match]) * (np.abs(x[match]) > threshold).astype(int)
    return signal

signals = df["value"].rolling(max_n, min_periods=1).apply(get_last_threshold).values
print(signals)

Example output for signals:

array([ 0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1.,  1.,  1., -1.,
   -1.,  0.,  1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,
   -1., -1., -1.,  1., -1., -1., -1., -1., -1., -1.,  0.,  1.,  1.,
    1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0., -1.,  0.,  1.,  0.,
   -1., -1., -1., -1., -1., -1., -1., -1.,  0.,  1.,  0.,  1.,  0.,
   -1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
    1.,  1.,  1., -1., -1., -1., -1., -1., -1., -1.,  1.,  1.,  1.,
   -1., -1., -1.,  0.,  1.,  1.,  1.,  1.,  1.])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM