使用scipy.signal库中的savgol_filter平滑Python上的在线数据

Question

I'd like to filter online data with savgol_filter from scipy.signal library. 我想使用scipy.signal库中的savgol_filter过滤在线数据。 But when I was trying to use it for online data (when new elements appear one by one) I realized that savgol_filter works with online data with some delay (window_length//2) in comparison to how it works with offline data (their elements are available for calculation all at once). 但是，当我尝试将其用于在线数据时（当新元素一一出现时），我意识到savgol_filter与在线数据一起使用时的延迟（window_length // 2）比与离线数据一起使用时（其元素是可一次全部计算）。 I use code similar to that (see below please) 我使用的代码与此类似（请参见下文）

from queue import Queue, Empty
import numpy as np
from scipy.signal import savgol_filter

window_size = 5
data = list()
q = Queue()
d = [2.22, 2.22, 5.55, 2.22, 1.11, 0.01, 1.11, 4.44, 9.99, 1.11, 3.33]
for i in d:
    q.put(i)

res = list()
while not q.empty():
    element = q.get()
    data.append(element)
    length = len(data)
    npd = np.array(data[length - window_size:])
    if length >= window_size:
        res.append(savgol_filter(npd , window_size, 2)[window_size // 2])

npd = np.array(data)
res2 = savgol_filter(npd , window_size, 2)

np.set_printoptions(precision=2)
print('source data ', npd)
print('online res  ', np.array(res))
print('offline res ', res2)

Am I right in my assumption? 我的假设对吗？ Can it be corrected somehow? 可以以某种方式更正吗？
If I am right could you please advice similar filter with no such issue in calculations? 如果我是对的，您能建议类似的过滤器在计算中没有此类问题吗？

Answer 1

Thanks for updating your question! 感谢您更新问题！

The problem is that for your online_res approach you are missing parts of your data. 问题是，对于您的online_res方法，您丢失了部分数据。 The edge-values are being taken care of by scipy's savgol_filter , but not for your hand-coded version. 边缘值由scipy的savgol_filter ，但不适用于您的手动编码版本。

For your example have a look at the two results: 对于您的示例，请看两个结果：

'online res': array([ 3.93, 3.17, 0.73, 0.2 , 1.11, 5.87, 6.37])) '在线解析度'：array（[3.93，3.17，0.73，0.2，1.11，5.87，6.37]））

'offline res': array([ 1.84, 3.52, 3.93, 3.17, 0.73, 0.2 , 1.11, 5.87, 6.37, 5.3, 1.84])) 'offline res'：array（[1.84，3.52，3.93，3.17，0.73，0.2，1.11，5.87，6.37，5.3，1.84]））

They are identical, but offline res took care of the values data[0:2] and data[-2:] . 它们是相同的，但是offline res负责值data[0:2]和data[-2:] 。 In your case, where not specific mode is specified, it is set to the default of interpolate : 在您的情况下，如果未指定特定mode ，则将其设置为interpolate的默认interpolate ：

When the 'interp' mode is selected (the default), no extension is used. 当选择“插入”模式（默认）时，不使用扩展名。 Instead, a degree polyorder polynomial is fit to the last window_length values of the edges, and this polynomial is used to evaluate the last window_length // 2 output values. 相反，度多阶多项式适合边缘的最后window_length值，并且该多项式用于评估最后window_length // 2个输出值。

And THIS you did not do for your online res . 而且这不是您的online res要做的。

I implemented a simple polynomial fit for both sides and get the exact same results then: 我对双方实施了一个简单的polynomial fit ，然后得到了完全相同的结果：

from queue import Queue, Empty
import numpy as np
from scipy.signal import savgol_filter

window_size = 5
data = list()
q = Queue()
d = [2.22, 2.22, 5.55, 2.22, 1.11, 0.01, 1.11, 4.44, 9.99, 1.11, 3.33]
for i in d:
    q.put(i)

res = list()
while not q.empty():
    element = q.get()
    data.append(element)
    length = len(data)
    npd = np.array(data[length - window_size:])
    if length >= window_size:
        res.append(savgol_filter(npd, window_size, 2)[window_size//2])

# calculate the polynomial fit for elements 0,1,2,3,4
poly = np.polyfit(range(window_size), d[0:window_size], deg=2)
p = np.poly1d(poly)
res.insert(0, p(0)) # insert the polynomial fits at index 0 and 1
res.insert(1, p(1))

# calculate the polynomial fit for the 5 last elements (range runs like [4,3,2,1,0])
poly = np.polyfit(range(window_size-1, -1, -1), d[-window_size:], deg=2)
p = np.poly1d(poly)
res.append(p(1))
res.append(p(0))

npd = np.array(data)
res2 = savgol_filter(npd, window_size, 2)


diff = res - res2 # in your example you were calculating the wrong diff btw
np.set_printoptions(precision=2)
print('source data ', npd)
print('online res  ', np.array(res))
print('offline res ', res2)
print('error       ', diff.sum())

results in: 结果是：

>>> Out: ('erorr   ', -7.9936057773011271e-15)

Edit: This version is independent of the d -list, meaning that it can digest whatever data it gets to grab from your source. 编辑：此版本独立于d -list，意味着它可以消化从源获取的所有数据。

window_size = 5
half_window_size = window_size // 2 # this variable is used often
data = list()
q = Queue()
d = [2.22, 2.22, 5.55, 2.22, 1.11, 0.01, 1.11, 4.44, 9.99, 1.11, 3.33]
for i in d:
    q.put(i)  
res = [None]*window_size # create list of correct size instead of appending

while not q.empty():
    element = q.get()
    data.append(element)
    length = len(data)
    npd = np.array(data[length - window_size:])

    if length == window_size: # this is called only once, when reaching the filter-center
        # calculate the polynomial fit for elements 0,1,2,3,4
        poly = np.polyfit(range(window_size), data, deg=2)
        p = np.poly1d(poly)

        for poly_i in range(half_window_size): # independent from window_size
            res[poly_i] = p(poly_i) 

        # insert the sav_gol-value at index 2
        res[(length-1)-half_window_size] = savgol_filter(npd, window_size, 2)[half_window_size] 

        poly = np.polyfit(range(window_size - 1, -1, -1), data[-window_size:], deg=2)
        p = np.poly1d(poly)
        for poly_i_end in range(half_window_size):
            res[(window_size-1)-poly_i_end] = p(poly_i_end)

    elif length > window_size:
        res.append(None) # add another slot in the res-list
        # overwrite poly-value with savgol
        res[(length-1)-half_window_size] = savgol_filter(npd, window_size, 2)[half_window_size] 

        # extrapolate again into the future
        poly = np.polyfit(range(window_size - 1, -1, -1), data[-window_size:], deg=2)
        p = np.poly1d(poly)
        for poly_i_end in range(half_window_size):
            res[-poly_i_end-1] = p(poly_i_end)

Answer 2

It looks like Kalman filters family are doing what I expect. 看来Kalman滤波器家族正在做我所期望的。 This is because they are optimal in terms of "Mean square error". 这是因为它们在“均方误差”方面是最佳的。 Implementation can be found here for example. 例如，可以在此处找到实现。

使用scipy.signal库中的savgol_filter平滑Python上的在线数据

问题描述

2 个解决方案

解决方案1
2 2017-11-03 13:15:34

解决方案2
1 已采纳 2018-05-29 12:21:04

使用scipy.signal库中的savgol_filter平滑Python上的在线数据

问题描述

2 个解决方案

解决方案1 2 2017-11-03 13:15:34

解决方案2 1 已采纳 2018-05-29 12:21:04

解决方案1
2 2017-11-03 13:15:34

解决方案2
1 已采纳 2018-05-29 12:21:04