简体   繁体   English

python列表上的可变滑动窗口

[英]variable sliding window on python list

I have a dataset that looks like this (1D python list): 我有一个看起来像这样的数据集(一维python列表):

[0,0,0,0,4,5,6,6,4,0,0,0,0,0,0,2,0,0,0,6,4,5,6,0,0,0,0,0]

I'm trying to find cutoff points for variations, based on the previous window . 我正在尝试根据上一个窗口查找变化的临界点。

I'm looking for an output of: 我正在寻找以下产品的输出:

[4, 9, 19, 23]

Asuming my window needs to be of at least 3, variation must occur at least for 3 consecutive elements and some noise in the data, I came up with : 假设我的窗口至少需要3个,至少3个连续元素必须有变化,并且数据中有一些噪声,所以我想到了:

  • Fill up window with at least 2 elements 用至少2个元素填充窗口
  • Calculate standard deviation, add all subsequent points that are within stddev to that window. 计算标准偏差,将stddev内的所有后续点添加到该窗口。 Recalculate every time you add a new point. 每次添加新点时都要重新计算。
  • When a point is outside of stddev (for ex here, the first occurence of 4), make sure the next point is also outside of stddev (first occurence of 5), and if so, append a new index with the first deviant point (4 here). 当一个点在stddev之外(例如,这里是第一个出现4),请确保下一个点也在stddev之外(第一次出现是5),如果是,则在第一个偏离点后面附加一个新索引( 4这里)。 If not keep adding to current window. 如果没有,继续添加到当前窗口。
  • The new 'deviant' values become the window to compare against, repeat. 新的“偏差”值成为与之进行比较和重复的窗口。

Is there a better way to do this, or a built-in numpy function to help out? 有更好的方法可以做到这一点,还是有内置的numpy函数来帮助您?

thanks. 谢谢。

Edit 编辑

The proposed solution by @qwwqwwq works well, but I have a another small constraint - I realized that my list values don't have the same weight. @qwwqwwq提出的解决方案效果很好,但是我还有一个小的限制-我意识到我的列表值没有相同的权重。 Assuming this new dataset : 假设这个新数据集:

[(10, 0), (20, 0), (15, 0), (20, 0), (8, 4), (10, 5), (15, 6), (15, 6), (10, 4), (5, 0),(5, 0), (20, 0), (10, 0), (8, 0),(5, 0), (10, 2), (5, 0), (5, 0), (5,0), (10,6) ,(5, 4), (5,5), (10, 6), (10, 0),(10,0) ,(10,0) ,(10,0) ,(10,0)]
  • Where pos 0 is a time duration in seconds 其中pos 0是持续时间,以秒为单位
  • pos 1 is my value pos 1是我的价值
  • minimum time to consider the peak is 30 seconds 认为峰值的最短时间是30秒

How could I replace widths = np.array([2] with my minimum time? 如何用我的最短时间替换widths = np.array([2]

I'm aware I could take slope_down_begin_points , check the closest slope_down_begin_points and see if the sum of points' duration between the two is > minimum time. 我知道我可以采用slope_down_begin_points ,检查最接近的slope_down_begin_points ,看看两个点之间的持续时间之和是否大于最小时间。 I'm not very familiar with signal , hopefully there's something better? 我对signal不是很熟悉,希望有更好的东西吗?

Edit 2 编辑2

Another simpler and more naive way of doing this is also to group >0 values together and slice out [0] and [-1] values as the edges. 另一种更简单,更幼稚的方法是将> 0值组合在一起,并将[0]和[-1]值切成边缘。

for k, g in groupby(x, key=lambda v: v[1] == 0):
    print k,g
    group = list(g)
    # only consider if long enough
    if sum([z[0] for z in group]) > some_minumum_time:
        # do stuff

The best approach I can think of for this problem is to fit a spline to the array, take the derivative, and then find all local maxima. 对于这个问题,我能想到的最好方法是将样条曲线拟合到数组中,取导数,然后找到所有局部最大值。 These local maxima should represent the boundaries of peaks, which I think is what you are after. 这些局部最大值应代表峰的边界,我想这就是您所追求的。 My approach: 我的方法:

from scipy import signal
from scipy import interpolate
import numpy as np
from numpy import linspace

x = [0,0,0,0,4,5,6,6,4,0,0,0,0,0,0,2,0,0,0,6,4,5,6,0,0,0,0,0]
s = interpolate.UnivariateSpline( linspace(0,len(x)-1,len(x)), np.array(x) )
ds = s.derivative()

slope_down_begin_points = [ p for p in signal.find_peaks_cwt( vector = [ -ds(v) for v in range(len(x)) ], widths = np.array([2]) ) if x[p-1] >= 1 ]

slope_up_begin_points = [ p for p in signal.find_peaks_cwt( vector = [ ds(v) for v in range(len(x)) ], widths = np.array([2]) ) if x[p+1] >= 1 ]

slope_up_begin_points + slope_down_begin_points
>> [4, 9, 16, 19, 23]

16 is included in this approach because it is a little micro-peak of its own, if you fiddle with the find_peaks_cwt / UnivariateSpline parameters you should be able to filter it out.. 这种方法包含16 ,因为它本身有点微峰,如果您使用find_peaks_cwt / UnivariateSpline参数,则应该可以将其过滤掉。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM