简体   繁体   English

将R中的曲线拟合到方程

[英]Fitting a curve in R to an equation

I've been trying to make a fitted curve on R, but have some issues. 我一直在尝试在R上拟合曲线,但是有一些问题。 I am working with several large data sets which make up x and y coordinates. 我正在处理组成x和y坐标的几个大数据集。 When plotted with ggplot's geom_point or any other plotting function, there's a trend where the plot tends to resemble the graph of a square root function. 当使用ggplot的geom_point或任何其他绘图函数进行绘图时,存在一种趋势,该绘图趋于类似于平方根函数的图。

This would be the code to make the fit using geom_smooth that I used: 这将是使用我使用的geom_smooth进行拟合的代码:

plt = ggplot(data = data2, aes(x = x, y = y)) + geom_point() +geom_smooth()

And that basically gets me this: 这基本上让我明白了这一点:

曲线图

Is there a way to make the curve more like the red square root curve (y=x^0.5) - basically make it smoother and to fit accordingly to a certain formula? 有没有一种方法可以使曲线更像红色平方根曲线(y = x ^ 0.5)-基本上使它更平滑并相应地适合某个公式? This is the smallest of the data sets to serve as an example. 以最小的数据集为例。

Example Data set CSV format 数据集CSV格式示例

I've also tried fitting with the method as loess, which gives a curve close to what I want, but for data sets which are either much larger (around 500,000-700,000 points) or have certain points which are very densely packed in a certain region loess does not seem to work as well. 我也尝试过将方法拟合为黄土,这使曲线接近我想要的,但对于更大(约500,000-700,000点)或某些点非常密集地封装在某些点中的数据集区域黄土似乎不起作用。 There's a tendency that the mean is a bit skewed, which makes sense since the copious amounts of points at that region are pushing it up. 均值存在某种偏斜的趋势,这是有道理的,因为该区域的大量点将其推高了。 But I need to fit the curve and force it into being close to the square root curve. 但是我需要拟合曲线并迫使其接近平方根曲线。 I've also tried messing with the span values, but that didn't really affect the smoothness of the curve. 我也尝试过弄乱跨度值,但这并没有真正影响曲线的平滑度。

One thing that came to my mind is the following. 我想到的一件事是以下内容。 Your best graph is probably evaluated by minimizing a chi square. 最好的图形可能是通过最小化卡方来评估的。 You may put an additional criterium to that, ie by how much this fit deviates from a square root behaviour. 您可以对此附加标准,即,该拟合度与平方根行为的偏离量为多少。 This can be done by fitting the solution by sqrt() and add a weighted chi-square to the total evaluation of the quality of your fit. 这可以通过使用sqrt()拟合解决方案并将加权卡方平方添加到对拟合质量的总体评估中来完成。 Not sure how to do that R , but in python you get something like this: 不知道如何执行R ,但是在python中,您得到的是这样的: 增加sqrt的重量 The blue graph would be the best sqrt() fit. 蓝色图将是最合适的sqrt() The yellow one is the best quadratic spline with knots at [0,0,.1,.2,.3,.4,.6,.9,.9,.9] , ie weight=0 (you could additionally optimize the knot position, didn't do that here). 黄色的是最好的二次样条,其结点为[0,0,.1,.2,.3,.4,.6,.9,.9,.9] ,即weight=0 (您还可以优化结位置,在此不做)。 Then we put increasing weight on how good the fit can be fitted by sqrt() , weights = 0.5,1,2 , respectively. 然后,我们增加权重以通过sqrt()分别拟合weights = 0.5,1,2weights = 0.5,1,2分别weights = 0.5,1,2

Code is as follows: 代码如下:

import matplotlib
matplotlib.use('Qt4Agg')

from matplotlib import pyplot as plt
import numpy as np
from scipy.optimize import leastsq,curve_fit

###from the scipy doc page as I have scipy 0.16 and no build in BSpline, yet
def B(x, k, i, t):
    if k == 0:
        return 1.0 if t[i] <= x < t[i+1] else 0.0
    if t[i+k] == t[i]:
        c1 = 0.0
    else:
        c1 = (x - t[i])/(t[i+k] - t[i]) * B(x, k-1, i, t)
    if t[i+k+1] == t[i+1]:
        c2 = 0.0
    else:
        c2 = (t[i+k+1] - x)/(t[i+k+1] - t[i+1]) * B(x, k-1, i+1, t)
    return c1 + c2


def bspline(x, t, c, k):
    n = len(t) - k - 1
    assert (n >= k+1) and (len(c) >= n)
    return sum(c[i] * B(x, k, i, t) for i in range(n))


def mixed_res(params,points,weight):
    [xList,yList] = zip(*points)
    bSplList=[bspline(x,[0,0,.1,.2,.3,.4,.6,.9,.9,.9],params,2) for x in xList]
    ###standard chisq
    diffTrue=[y-b for y,b in zip(yList,bSplList)]
    ###how good can the spline be fitted with sqrt
    locfit,_=curve_fit(sqrtfunc,xList,bSplList)
    sqrtList=[sqrtfunc(x,locfit[0]) for x in xList]
    diffWeight=[ weight*(s-b) for s,b in zip(sqrtList,bSplList)]
    return diffTrue+diffWeight

def sqrtfunc(x,a):
    return a*np.sqrt(x)


xList,yList=np.loadtxt("PHOQSTACK.csv", unpack=True, delimiter=',')
xListSorted=sorted(xList)
zipData=zip(xList,yList)

fig=plt.figure(1)
ax=fig.add_subplot(1,1,1)

knotList=[0,0,.1,.2,.3,.4,.6,.9,.9,.9]
order=2

sqrtvalues,_=curve_fit(sqrtfunc,xList,yList)
th_sqrt_y=[sqrtfunc(x,sqrtvalues[0]) for x in xListSorted]

ax.scatter(xList,yList,s=1)
ax.plot(xListSorted,th_sqrt_y)

fitVals=[.2,.3,.4,.2,.3,.4,.2]
for s in [0,.5,1,2]:
    print s
    fitVals,ier=leastsq(mixed_res,fitVals,args=( zipData, s ) )
    th_b_y=[bspline(x,knotList,fitVals,order) for x in xListSorted]
    ax.plot(xListSorted,th_b_y)

plt.show()

Problem is that for large weights, the fit is more busy getting the shape to sqrt than fitting the actual data and you might run into convergence issues. 问题在于,对于较大的权重,拟合要比将实际数据拟合sqrt更多的时间将其转换为sqrt ,您可能会遇到收敛问题。

A second option would be to directly make the sqrt part of the fit and provide its relative contribution as part of the chi square. 第二种选择是直接使sqrt成为拟合的一部分,并提供其相对贡献作为卡方的一部分。 包括sqrt The blue and yellow graphs as before. 与以前一样,蓝色和黄色图形。 The others are weigted fits with the same weights as above. 其他均采用与上述相同的权重进行拟合。

For this I changed the residual function to 为此,我将残差函数更改为

def mixed_res(params,points,weight):
    a=params[0]
    coffs=params[1:]
    [xList,yList] = zip(*points)
    sqrtList=[a*np.sqrt(x) for x in xList]
    bSplList=[bspline(x,[0,0,.1,.2,.3,.4,.6,.9,.9,.9],coffs,2) for x in xList]
    diffTrue=[y-s-b for y,s,b in zip(yList,sqrtList,bSplList)]
    diffWeight=[ weight*(s-b)/(s+.001) for s,b in zip(sqrtList,bSplList)]

    return diffTrue+diffWeight

and the call to fit as 并呼吁适合

fitVals=[.4]+[.2,.3,.4,.2,.3,.4,.4]
for s in [0,.5,1,2]:
    print s
    fitVals,ier=leastsq(mixed_res,fitVals,args=( zipData, s ) )
    th_b_y=[fitVals[0]*np.sqrt(x)+bspline(x,knotList,fitVals[1:],order) for x in xListSorted]
    ax.plot(xListSorted,th_b_y)

The remaining big question is: How do you decide which weighting to take? 剩下的大问题是:您如何确定要采用的权重? What do you mean by more like square root ? 更像平方根是什么意思?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM