简体   繁体   中英

sliding window approach in Python

I have a dataframe DF, with two columns A and B shown below:

A                    B                  
1                    0             
3                    0               
4                    0                   
2                    1                    
6                    0                    
4                    1                     
7                    1                 
8                    1                     
1                    0   

First part: A sliding window approach should be performed as shown below. I need to calculate mean for column B in a sliding window of size 3 sliding by 1 position . The mean values for each window are calculated manually and shown on the left side.

    A:         1    3    4    2    6    4    7    8    1                                          
    B:         0    0    0    1    0    1    1    1    0                                
              [0    0    0]                                              0
                    [0    0    1]                                        0.33
                          [0    1    0]                                  0.33
                                [1    0    1]                            0.66
                                      [0    1    1]                      0.66
                                            [1    1    1]                1
                                                 [1    1    0]           0.66
output:        0   0.33 0.33 0.66   0.66    1     1    1   0.66

Second part :Now, for each row/coordinate in column A, all windows containing the coordinate are considered and should retain the highest mean value which gives the results as shown in column 'output'.

Detailed explanation for second part:The first part is calculating the mean in a sliding window 3 sliding by 1 position. The second step is: For each coordinate 'i' in column A, all windows containing the coordinate 'i' should be evaluated and retain the highest mean score. For example in column A, 1 is present only in the first window, so the score for 1 is 0 (which is the mean of the first window). Similarly, 2 is present in first and second window, therefore the score for 2 should be the highest among the scores of window1 and window2 ie max(0, 0.33333). Likewise 3 is present in first,second and third windows, therefore score for 3 is max of the scores of first three windows ie max(0,0.333333,0.3333333). 4 is present in second,third and fourth windows, therefore score for 4 is max of the scores of those windows ie max(0.333333,0.3333333,0.666667)and so on..

I need to obtain the output as shown above. The output should like:

A                   B                  Output   
1                   0                      0
3                   0                      0.33
4                   0                      0.33
2                   1                      0.66
6                   0                      0.66
4                   1                      1
7                   1                      1
8                   1                      1
1                   0                    0.66

Any help in python would be highly appreciated?

For the first part, using numpy :

WS = 3
B = numpy.array([0,0,0,1,0,1,1,1,0])
filt = numpy.ones(WS) / WS
mean = numpy.convolve(B, filt, 'valid')

For the second part:

paddedmean = numpy.zeros(mean.size + 2 * (WS - 1))
paddedmean[WS-1:-(WS-1)] = mean
output = [numpy.max(paddedmean[i:i+WS]) for i in range(mean.size+WS-1)]

But what is A used for???

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM