[英]variable sliding window on python list
I have a dataset that looks like this (1D python list): 我有一个看起来像这样的数据集(一维python列表):
[0,0,0,0,4,5,6,6,4,0,0,0,0,0,0,2,0,0,0,6,4,5,6,0,0,0,0,0]
I'm trying to find cutoff points for variations, based on the previous window . 我正在尝试根据上一个窗口查找变化的临界点。
I'm looking for an output of: 我正在寻找以下产品的输出:
[4, 9, 19, 23]
Asuming my window needs to be of at least 3, variation must occur at least for 3 consecutive elements and some noise in the data, I came up with : 假设我的窗口至少需要3个,至少3个连续元素必须有变化,并且数据中有一些噪声,所以我想到了:
Is there a better way to do this, or a built-in numpy function to help out? 有更好的方法可以做到这一点,还是有内置的numpy函数来帮助您?
thanks. 谢谢。
The proposed solution by @qwwqwwq works well, but I have a another small constraint - I realized that my list values don't have the same weight. @qwwqwwq提出的解决方案效果很好,但是我还有一个小的限制-我意识到我的列表值没有相同的权重。 Assuming this new dataset :
假设这个新数据集:
[(10, 0), (20, 0), (15, 0), (20, 0), (8, 4), (10, 5), (15, 6), (15, 6), (10, 4), (5, 0),(5, 0), (20, 0), (10, 0), (8, 0),(5, 0), (10, 2), (5, 0), (5, 0), (5,0), (10,6) ,(5, 4), (5,5), (10, 6), (10, 0),(10,0) ,(10,0) ,(10,0) ,(10,0)]
How could I replace widths = np.array([2]
with my minimum time? 如何用我的最短时间替换
widths = np.array([2]
?
I'm aware I could take slope_down_begin_points
, check the closest slope_down_begin_points
and see if the sum of points' duration between the two is > minimum time. 我知道我可以采用
slope_down_begin_points
,检查最接近的slope_down_begin_points
,看看两个点之间的持续时间之和是否大于最小时间。 I'm not very familiar with signal
, hopefully there's something better? 我对
signal
不是很熟悉,希望有更好的东西吗?
Another simpler and more naive way of doing this is also to group >0 values together and slice out [0] and [-1] values as the edges. 另一种更简单,更幼稚的方法是将> 0值组合在一起,并将[0]和[-1]值切成边缘。
for k, g in groupby(x, key=lambda v: v[1] == 0):
print k,g
group = list(g)
# only consider if long enough
if sum([z[0] for z in group]) > some_minumum_time:
# do stuff
The best approach I can think of for this problem is to fit a spline to the array, take the derivative, and then find all local maxima. 对于这个问题,我能想到的最好方法是将样条曲线拟合到数组中,取导数,然后找到所有局部最大值。 These local maxima should represent the boundaries of peaks, which I think is what you are after.
这些局部最大值应代表峰的边界,我想这就是您所追求的。 My approach:
我的方法:
from scipy import signal
from scipy import interpolate
import numpy as np
from numpy import linspace
x = [0,0,0,0,4,5,6,6,4,0,0,0,0,0,0,2,0,0,0,6,4,5,6,0,0,0,0,0]
s = interpolate.UnivariateSpline( linspace(0,len(x)-1,len(x)), np.array(x) )
ds = s.derivative()
slope_down_begin_points = [ p for p in signal.find_peaks_cwt( vector = [ -ds(v) for v in range(len(x)) ], widths = np.array([2]) ) if x[p-1] >= 1 ]
slope_up_begin_points = [ p for p in signal.find_peaks_cwt( vector = [ ds(v) for v in range(len(x)) ], widths = np.array([2]) ) if x[p+1] >= 1 ]
slope_up_begin_points + slope_down_begin_points
>> [4, 9, 16, 19, 23]
16
is included in this approach because it is a little micro-peak of its own, if you fiddle with the find_peaks_cwt
/ UnivariateSpline
parameters you should be able to filter it out.. 这种方法包含
16
,因为它本身有点微峰,如果您使用find_peaks_cwt
/ UnivariateSpline
参数,则应该可以将其过滤掉。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.