简体   繁体   English

在Python中拟合参数曲线

[英]Fitting Parametric Curves in Python

I have experimental data of the form (X,Y) and a theoretical model of the form (x(t;*params),y(t;*params)) where t is a physical (but unobservable) variable, and *params are the parameters that I want to determine. 我有形式(X,Y)实验数据和形式的理论模型(x(t;*params),y(t;*params))其中t是物理(但不可观察)变量, *params是我想要确定的参数。 t is a continuous variable, and there is a 1:1 relationship between x and t and between y and t in the model. t是连续变量,模型中xt之间以及yt之间存在1:1的关系。

In a perfect world, I would know the value of T (the real-world value of the parameter) and would be able to do an extremely basic least-squares fit to find the values of *params . 在一个完美的世界中,我会知道T的值(参数的真实世界值),并且能够进行极其基本的最小二乘拟合来找到*params的值。 (Note that I am not trying to "connect" the values of x and y in my plot, like in 31243002 or 31464345 .) I cannot guarantee that in my real data, the latent value T is monotonic, as my data is collected across multiple cycles. (请注意,我并非尝试在我的绘图中“连接” xy的值,如3124300231464345. )我无法保证在我的实际数据中,潜在值T是单调的,因为我的数据是在多个周期。

I'm not very experienced doing curve fitting manually, and have to use extremely crude methods without easy access to a basic scipy function. 我不是很有经验手动进行曲线拟合,并且必须使用非常粗糙的方法而不能轻松访问基本的scipy函数。 My basic approach involves: 我的基本方法涉及:

  1. Choose some value of *params and apply it to the model 选择*params某些值并将其应用于模型
  2. Take an array of t values and put it into the model to create an array of model(*params) = (x(*params),y(*params)) 取一个t值数组并将其放入模型中以创建模型数组model(*params) = (x(*params),y(*params))
  3. Interpolate X (the data values) into model to get Y_predicted X (数据值)插值到model以获得Y_predicted
  4. Run a least-squares (or other) comparison between Y and Y_predicted YY_predicted之间运行最小二乘(或其他)比较
  5. Do it again for a new set of *params 再做一套新的*params
  6. Eventually, choose the best values for *params 最后,为*params选择最佳值

There are several obvious problems with this approach. 这种方法存在几个明显的问题。

1) I'm not experienced enough with coding to develop a very good "do it again" other than "try everything in the solution space," of maybe "try everything in a coarse grid" and then "try everything again in a slightly finer grid in the hotspots of the coarse grid." 1)我没有足够的经验来编写一个非常好的“再做一次”而不是“尝试解决方案空间中的所有内容”,也许“尝试粗网格中的所有内容”,然后“稍微尝试一下”在粗网格的热点中更精细的网格。“ I tried doing MCMC methods, but I never found any optimum values, largely because of problem 2 我尝试过做MCMC方法,但我从来没有找到任何最佳值,主要是因为问题2

2) Steps 2-4 are super inefficient in their own right. 2)步骤2-4本身就是超级低效的。

I've tried something like (resembling pseudo-code; the actual functions are made up). 我尝试了类似的东西(类似伪代码;实际的功能组成)。 There are many minor quibbles that could be made about using broadcasting on A,B, but those are less significant than the problem of needing to interpolate for every single step. 关于在A,B上使用广播,可以做出许多微小的狡辩,但这些问题不如每个步骤需要插值的问题重要。

People I know have recommended using some sort of Expectation Maximization algorithm, but I don't know enough about that to code one up from scratch. 我认识的人建议使用某种期望最大化算法,但我不知道如何从头开始编写代码。 I'm really hoping there's some awesome scipy (or otherwise open-source) algorithm I haven't been able to find that covers my whole problem, but at this point I am not hopeful. 我真的希望有一些令人敬畏的scipy(或其他开源)算法,我无法找到它涵盖了我的整个问题,但在这一点上我并不抱希望。

import numpy as np
import scipy as sci
from scipy import interpolate

X_data
Y_data

def x(t,A,B):
    return A**t + B**t
def y(t,A,B):
    return A*t + B

def interp(A,B):
    ts = np.arange(-10,10,0.1)
    xs = x(ts,A,B)
    ys = y(ts,A,B)
    f = interpolate.interp1d(xs,ys)
    return f

N = 101
lsqs = np.recarray((N**2),dtype=float)

count = 0
for i in range(0,N):
    A = 0.1*i            #checks A between 0 and 10
    for j in range(0,N):
        B = 10 + 0.1*j   #checks B between 10 and 20

        f = interp(A,B)
        y_fit = f(X_data)
        squares = np.sum((y_fit - Y_data)**2)

        lsqs[count] = (A,b,squares) #puts the values in place for comparison later
        count += 1        #allows us to move to the next cell

i = np.argmin(lsqs[:,2])

A_optimal = lsqs[i][0]
B_optimal = lsqs[i][1]

If I understand the question correctly, the params are constants which are the same in every sample, but t varies from sample to sample. 如果我正确地理解了这个问题,那么参数是每个样本中相同的常数,但是t在样本之间是不同的。 So, for example, maybe you have a whole bunch of points which you believe have been sampled from a circle 所以,例如,也许你有一堆你相信从圆圈中采样的点

x = a+r cos(t)   
y = b+r sin(t)

at different values of t . 在不同的t值。

In this case, what I would do is eliminate the variable t to get a relation between x and y -- in this case, (xa)^2+(yb)^2 = r^2 . 在这种情况下,我要做的是消除变量t以获得xy之间的关系 - 在这种情况下, (xa)^2+(yb)^2 = r^2 If your data fit the model perfectly, you would have (xa)^2+(yb)^2 = r^2 at each of your data points. 如果您的数据完全符合模型,那么您的每个数据点都会有(xa)^2+(yb)^2 = r^2 With some error, you could still find (a,b,r) to minimize 有一些错误,你仍然可以找到(a,b,r)来最小化

sum_i ((x_i-a)^2 + (y_i-b)^2 - r^2)^2.

Mathematica's Eliminate command can automate the procedure of eliminating t in some cases. 在某些情况下,Mathematica的Eliminate命令可以自动执行消除t的过程。

PS You might do better at stats.stackexchange, math.stackexchange or mathoverflow.net . PS你可以在stats.stackexchange,math.stackexchange或mathoverflow.net上做得更好。 I know the last one has a scary reputation, but we don't bite, really! 我知道最后一个有一个可怕的声誉,但我们不咬,真的!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM