[英]sliding window approach in Python
I have a dataframe DF, with two columns A and B shown below: 我有一个数据框DF,下面显示了两列A和B:
A B
1 0
3 0
4 0
2 1
6 0
4 1
7 1
8 1
1 0
First part: A sliding window approach should be performed as shown below. 第一部分:应该执行滑动窗口方法,如下所示。 I need to calculate mean for column B in a sliding window of size 3 sliding by 1 position .
我需要在大小为3滑动1个位置的滑动窗口中计算B列的均值。 The mean values for each window are calculated manually and shown on the left side.
手动计算每个窗口的平均值,并在左侧显示。
A: 1 3 4 2 6 4 7 8 1
B: 0 0 0 1 0 1 1 1 0
[0 0 0] 0
[0 0 1] 0.33
[0 1 0] 0.33
[1 0 1] 0.66
[0 1 1] 0.66
[1 1 1] 1
[1 1 0] 0.66
output: 0 0.33 0.33 0.66 0.66 1 1 1 0.66
Second part :Now, for each row/coordinate in column A, all windows containing the coordinate are considered and should retain the highest mean value which gives the results as shown in column 'output'. 第二部分:现在,对于A列中的每一行/坐标,将考虑所有包含该坐标的窗口,并应保留最高平均值,该平均值给出的结果如“输出”列中所示。
Detailed explanation for second part:The first part is calculating the mean in a sliding window 3 sliding by 1 position. 第二部分的详细说明:第一部分是计算在滑动窗口3中滑动1个位置的平均值。 The second step is: For each coordinate 'i' in column A, all windows containing the coordinate 'i' should be evaluated and retain the highest mean score.
第二步是:对于列A中的每个坐标“ i”,应评估所有包含坐标“ i”的窗口并保留最高的平均得分。 For example in column A, 1 is present only in the first window, so the score for 1 is 0 (which is the mean of the first window).
例如,在A列中,仅在第一个窗口中显示1,因此1的得分为0(这是第一个窗口的平均值)。 Similarly, 2 is present in first and second window, therefore the score for 2 should be the highest among the scores of window1 and window2 ie max(0, 0.33333).
类似地,第一个和第二个窗口中存在2,因此2的分数应在window1和window2的分数中最高,即max(0,0.33333)。 Likewise 3 is present in first,second and third windows, therefore score for 3 is max of the scores of first three windows ie max(0,0.333333,0.3333333).
同样,在第一,第二和第三窗口中存在3,因此3的分数是前三个窗口的分数中的最大值,即max(0,0.333333,0.3333333)。 4 is present in second,third and fourth windows, therefore score for 4 is max of the scores of those windows ie max(0.333333,0.3333333,0.666667)and so on..
4位于第二,第三和第四窗口中,因此4的分数是那些窗口的分数的最大值,即max(0.333333、0.3333333、0.666667),依此类推。
I need to obtain the output as shown above. 我需要获得如上所述的输出。 The output should like:
输出应为:
A B Output
1 0 0
3 0 0.33
4 0 0.33
2 1 0.66
6 0 0.66
4 1 1
7 1 1
8 1 1
1 0 0.66
Any help in python would be highly appreciated? python中的任何帮助将不胜感激?
For the first part, using numpy
: 对于第一部分,使用
numpy
:
WS = 3
B = numpy.array([0,0,0,1,0,1,1,1,0])
filt = numpy.ones(WS) / WS
mean = numpy.convolve(B, filt, 'valid')
For the second part: 对于第二部分:
paddedmean = numpy.zeros(mean.size + 2 * (WS - 1))
paddedmean[WS-1:-(WS-1)] = mean
output = [numpy.max(paddedmean[i:i+WS]) for i in range(mean.size+WS-1)]
But what is A
used for??? 但是
A
是什么?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.