简体   繁体   中英

How to find index of the window which has certain number of values greater than threshold?

在此处输入图片说明

I have just started learning python and struggling with this code. I have a dataframe which looks like shown in graph.

I want to find the first occurrence of the window in the dataframe which has certain number of values greater than threshold.

For Exmaple:

Let's say dataframe is 1000000 values in dimension. I want divide this in sliding window of 1000 and need to know whether this 1000 values alteast have 10 values which are greater than certain threshold. If first window (point 0-999) does not have the atleast 10 values greater than certain threshold, window will slide and consider values 1-1000. I have to find the index of the first occurrence of window which has atleast 10 values greater than threshold.

Also as I am dealing here streaming data, I need to stop the search when such window in dataframe occurs.

I tried this code but getting key error and not able to solve the problem.

for i in np.arange(0,len(data)-999):
    for j in np.arange(0,1000):
        if data[i+j]>threshold:
            var_count=var_count+1
        if var_count>10:
            print("Anomaly has occurred")

Sample data looks like this which has around 1.8 million rows.

在此处输入图片说明

Sample data could looke like this

data_sample=[1,1,0,0,0,2,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,2,1,0,0,2,2,2,2,1,1,1]            
data_sample=pd.DataFrame(data_sample)

threshold=1
window=5

in that I need at least 2 values which are greater than 1 which would return the index 18 as at that index my window of length 5 has at least 2 values which are greater than 1.

You can do it with convolution:

threshold = 10
window_size = 5
count_threshold = 3

kernel = np.ones(window_size)
over_threshold = (data['relevant_column'] > threshold).values
running_count = np.convolve(kernel, over_threshold)
np.nonzero(running_count >= count_threshold)[0]

Or a similar idea using pandas rolling :

np.where(((data['relevant_column'] > threshold).rolling(window_size).sum() >= count_threshold))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM