简体   繁体   English

Python中的滑动窗口方法

[英]sliding window approach in Python

I have a dataframe DF, with two columns A and B shown below: 我有一个数据框DF,下面显示了两列A和B:

A                    B                  
1                    0             
3                    0               
4                    0                   
2                    1                    
6                    0                    
4                    1                     
7                    1                 
8                    1                     
1                    0   

First part: A sliding window approach should be performed as shown below. 第一部分:应该执行滑动窗口方法,如下所示。 I need to calculate mean for column B in a sliding window of size 3 sliding by 1 position . 我需要在大小为3滑动1个位置的滑动窗口中计算B列的均值。 The mean values for each window are calculated manually and shown on the left side. 手动计算每个窗口的平均值,并在左侧显示。

    A:         1    3    4    2    6    4    7    8    1                                          
    B:         0    0    0    1    0    1    1    1    0                                
              [0    0    0]                                              0
                    [0    0    1]                                        0.33
                          [0    1    0]                                  0.33
                                [1    0    1]                            0.66
                                      [0    1    1]                      0.66
                                            [1    1    1]                1
                                                 [1    1    0]           0.66
output:        0   0.33 0.33 0.66   0.66    1     1    1   0.66

Second part :Now, for each row/coordinate in column A, all windows containing the coordinate are considered and should retain the highest mean value which gives the results as shown in column 'output'. 第二部分:现在,对于A列中的每一行/坐标,将考虑所有包含该坐标的窗口,并应保留最高平均值,该平均值给出的结果如“输出”列中所示。

Detailed explanation for second part:The first part is calculating the mean in a sliding window 3 sliding by 1 position. 第二部分的详细说明:第一部分是计算在滑动窗口3中滑动1个位置的平均值。 The second step is: For each coordinate 'i' in column A, all windows containing the coordinate 'i' should be evaluated and retain the highest mean score. 第二步是:对于列A中的每个坐标“ i”,应评估所有包含坐标“ i”的窗口并保留最高的平均得分。 For example in column A, 1 is present only in the first window, so the score for 1 is 0 (which is the mean of the first window). 例如,在A列中,仅在第一个窗口中显示1,因此1的得分为0(这是第一个窗口的平均值)。 Similarly, 2 is present in first and second window, therefore the score for 2 should be the highest among the scores of window1 and window2 ie max(0, 0.33333). 类似地,第一个和第二个窗口中存在2,因此2的分数应在window1和window2的分数中最高,即max(0,0.33333)。 Likewise 3 is present in first,second and third windows, therefore score for 3 is max of the scores of first three windows ie max(0,0.333333,0.3333333). 同样,在第一,第二和第三窗口中存在3,因此3的分数是前三个窗口的分数中的最大值,即max(0,0.333333,0.3333333)。 4 is present in second,third and fourth windows, therefore score for 4 is max of the scores of those windows ie max(0.333333,0.3333333,0.666667)and so on.. 4位于第二,第三和第四窗口中,因此4的分数是那些窗口的分数的最大值,即max(0.333333、0.3333333、0.666667),依此类推。

I need to obtain the output as shown above. 我需要获得如上所述的输出。 The output should like: 输出应为:

A                   B                  Output   
1                   0                      0
3                   0                      0.33
4                   0                      0.33
2                   1                      0.66
6                   0                      0.66
4                   1                      1
7                   1                      1
8                   1                      1
1                   0                    0.66

Any help in python would be highly appreciated? python中的任何帮助将不胜感激?

For the first part, using numpy : 对于第一部分,使用numpy

WS = 3
B = numpy.array([0,0,0,1,0,1,1,1,0])
filt = numpy.ones(WS) / WS
mean = numpy.convolve(B, filt, 'valid')

For the second part: 对于第二部分:

paddedmean = numpy.zeros(mean.size + 2 * (WS - 1))
paddedmean[WS-1:-(WS-1)] = mean
output = [numpy.max(paddedmean[i:i+WS]) for i in range(mean.size+WS-1)]

But what is A used for??? 但是A是什么?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM