简体   繁体   English

Python中的连续分段线性拟合

[英]Continuous Piecewise-Linear Fit in Python

I have a number of short time-series ( maybe 30 - 100 time points ), and they have a general shape : they start high, come down quickly, may or may not plateau near zero, and then go back up. 我有一些短时间序列(可能是30-100个时间点),它们具有大致的形状:它们从高处开始,快速下降,可能稳定在零附近,也可能不平稳,然后再上升。 If they don't plateau, they look something like a simple quadratic, and if they do plateau, you may have a long series of zeros. 如果它们不平稳,则它们看起来像是一个简单的二次方;如果它们平稳,则可能会有很长的零序列。

I'm trying to use the lmfit module to fit a piecewise linear curve that is continuous. 我正在尝试使用lmfit模块来拟合连续的分段线性曲线。 I'd like to infer where the line changes gradients , that is, I want to know where the curve "qualitatitively" changes gradients. 我想推断直线在哪里改变梯度 ,也就是说,我想知道曲线“定性”在哪里改变梯度。 I'd like to know when the gradient stops going down, and when it starts increasing again, in general terms. 总的来说,我想知道什么时候梯度停止下降,什么时候又开始增大。 I'm having a few issues with it : 我有一些问题:

  • lmfit seems to require at least two parameters, so I'm having to pass _ . lmfit似乎至少需要两个参数,因此我必须传递_
  • I'm unsure how to constrain one parameter to be greater than another. 我不确定如何限制一个参数大于另一个参数。
  • I'm getting could not broadcast input array from shape (something) into shape (something) errors could not broadcast input array from shape (something) into shape (something)错误

Here's some code. 这是一些代码。 First, my objective function, to be minimised. 首先,我的目标功能应被最小化。

def piecewiselinear(params, data, _) :

    t1 = params["t1"].value
    t2 = params["t2"].value
    m1 = params["m1"].value
    m2 = params["m2"].value
    m3 = params["m3"].value
    c = params["c"].value

    # Construct continuous, piecewise-linear fit
    model = np.zeros_like(data)
    model[:t1] = c + m1 * np.arange(t1)
    model[t1:t2] = model[t1-1] + m2 * np.arange(t2 - t1)
    model[t2:] = model[t2-1] + m3 * np.arange(len(data) - t2)

    return model - data

I then call, 然后我打电话

p = lmfit.Parameters()
p.add("t1", value = len(data)/4, min = 1, max = len(data))
p.add("t2", value = len(data)/4*3, min = 2, max = len(data))
p.add("m1", value = -100., max=0)
p.add("m2", value = 0.)
p.add("m3", value = 20., min = 1.)
p.add("c", min=0, value = 800.)

result = lmfit.minimize(piecewiselinear, p, args = (data, _) )

The model is that, at some time t1, the gradient of the line changes, and the same happens at t2. 该模型的特征是,在时间t1处,直线的坡度发生变化,并且在时间t2处也发生了相同的变化。 Both of these parameters, as well as the gradients of the line segments ( and one intercept ), need to be inferred. 这两个参数以及线段的坡度(和一个截距)都需要推断出来。

I could do this using MCMC methods, but I have too many of these series, and it would take too long. 我可以使用MCMC方法来执行此操作,但是我有太多这样的系列文章,而且会花费很长时间。

Part of the traceback : 部分回溯:

     15     model = np.zeros_like(data)
     16     model[:t1] = c + m1 * np.arange(t1)
---> 17     model[t1:t2] = model[t1-1] + m2 * np.arange(t2-t1)
     18     model[t2:] = model[t2-1] + m3 * np.arange(len(data) - t2)
     19 

ValueError: could not broadcast input array from shape (151) into shape (28)

A couple of examples of the time-series : 时间序列的几个例子: 无平稳期:稳步下降长而陡峭的高原

Any and all suggestions welcome. 任何和所有建议,欢迎。 Thank you very much. 非常感谢你。

Here's a plot from a rather brute-force 3-pwlin fitter; 这是一个蛮力的3-pwlin钳工的图。 will trade rough code for test cases. 将用粗糙的代码来交换测试用例。 在此处输入图片说明

Also, a couple of links: 另外,几个链接:
Fit-piecewise-linear-data on dsp.stack might give you some ideas; 在dsp.stack上逐段拟合线性数据可能会给您一些想法; added a bit on Dynamic programming . 动态编程上增加了一点。
github.com/NickFoubert/simple-segment has python for segmenting eg ECGs with max_error (not number of pieces), from a nice paper by Keogh et al., An online algorithm for segmenting time series , 2001, 8p. github.com/NickFoubert/simple-segment从keogh等人的一篇不错的论文中(例如, 用于分割时间序列的在线算法 ,2001,8p),用python进行了分割,例如具有max_error(而不是碎片数)的ECG。

And a possible alternative: could you just fit the power p in y ~ x^p , log y ~ p log x^2 (after shifting x to [-1 .. 1] and y > 1e-6 or so) ? 还有一种可能的替代方法:是否可以将幂py ~ x^plog y ~ p log x^2 (将x移至[-1 .. 1]且y > 1e-6左右)?
This would be robust , fast , and easy to plot and understand. 这将是健壮快速且易于绘制和理解的。
One should probably weight the ends so that the errors are roughly flat and normal. 可能应该给两端加重,以使误差大致平坦且正常。
Also one could fit separate p p' to the left and right halves. 也可以在左右两半分别放置p p'

Going down the brute force route seems to do the trick. 走上蛮力路线似乎可以解决问题。 I'm just testing all combinations of switchpoints and picking the best fit. 我只是测试所有开关点组合,然后选择最合适的开关。 It's very quick and can be reasonably robust. 它非常快并且可以相当健壮。 Here's the result of one particular fit. 这是一种特别适合的结果。

在此处输入图片说明

I'm forcing the gradient of the second line to be zero. 我强迫第二条线的渐变为零。 This ensures that we don't get an OK fit for two lines and a perfect fit for one, which may grab a higher score ( I'm using the sum of R^2 values here ). 这样可以确保我们不会两行都适合,而第一行却无法完美匹配,这可能会获得更高的分数(我在这里使用R ^ 2值的总和)。 In green are marked the switchpoints, and these should work very well for my application. 绿色标记为切换点,这些切换点对于我的应用程序应该很好用。

I'd love to learn a more elegant want to do this, but in the meantime, this is an option... 我很想学习一个更优雅的方法,但是与此同时,这是一个选择...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM