改善曲线拟合R中的数据

Question

Having trouble fitting an appropriate curve to this data. 无法为此数据拟合适当的曲线。

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 21, 31, 41, 51, 61, 71, 
81, 91, 110, 210, 310, 410, 510, 610, 710, 810, 910, 1100, 2100, 
3100, 4100, 5100, 6100, 7100, 8100, 9100)

y <- c(75, 84, 85, 89, 88, 91, 92, 92, 93, 92, 94, 95, 95, 96, 95, 
95, 94, 97, 97, 97, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99, 
99, 99, 99, 99, 99, 99)

Tried so far: 到目前为止尝试过：

fit1 <- lm(y~log(x)+I(1/x))
fit2 <- lm(y~log(x)+I(1/x)+x)

plot(x,y, log="x")
lines(0.01:10000, predict(fit1, newdata = data.frame(x=0.01:10000)))
lines(0.01:10000, predict(fit2, newdata = data.frame(x=0.01:10000)), col='red')

The fits are ok, but arrived at entirely empirically and there is room for improvement. 这种情况很合适，但完全凭经验得出，还有改进的余地。 I did not fit loess or splines to be any better. 我不适合黄土或花键更好。

The concrete goal is to increase the R^2 of the fit and improve regression diagnostics (eg QQ plots of residuals). 具体目标是增加拟合的R ^ 2并改进回归诊断（例如残差的QQ图）。

Edit : Expected Model: this is sampling data, where more samples (x) improve the accuracy of the estimate (y); 编辑： 预期模型：这是抽样数据，其中更多样本（x）提高估计的准确性（y）; it would saturate at 100%. 它会以100％饱和。

Answer 1

This would be my function guess and according fit in python 这将是我的函数猜测，并在python中适合

# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as so


def f( x, a, b , s, p ):
    return a + b * s * ( x - 1 ) / (  1 + ( s * ( x - 1 ) )**( abs( 1 / p ) ) )**abs( p )


def g( x, a , s, p ):
    return a * s * x / (  1 + ( s * x )**( abs( 1 / p ) ) )**abs( p )


def h( x, s, p ):
    return 100 * s * x / (  1 + ( s * x )**( abs( 1 / p ) ) )**abs( p )


xData = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 21, 31, 41, 51, 61, 71, 
        81, 91, 110, 210, 310, 410, 510, 610, 710, 810, 910, 1100, 2100, 
        3100, 4100, 5100, 6100, 7100, 8100, 9100 ]

yData = [ 75, 84, 85, 89, 88, 91, 92, 92, 93, 92, 94, 95, 95, 96, 95, 
        95, 94, 97, 97, 97, 98, 98, 98, 99, 99, 99, 99, 99, 99, 99, 99, 
        99, 99, 99, 99, 99, 99 ]

xList = np.logspace( 0, 5, 100 )

bestFitF, err = so.curve_fit( f , xData, yData, p0=[ 75, 25, 1, 1])
bestFitG, err = so.curve_fit( g , xData, yData)
bestFitH, err = so.curve_fit( h , xData, yData)

fList = np.fromiter( ( f(x, *bestFitF ) for x in xList ), np.float)
gList = np.fromiter( ( g(x, *bestFitG ) for x in xList ), np.float)
hList = np.fromiter( ( h(x, *bestFitH ) for x in xList ), np.float)

fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )

ax.plot( xData, yData, marker='o', linestyle='')
ax.plot( xList, fList, linestyle='-.', label='f')
ax.plot( xList, gList, linestyle='-.', label='g')
ax.plot( xList, hList, linestyle='-.', label='h')

ax.set_xscale( 'log' )
ax.legend( loc=0 )
plt.show()

Function f requires start values, g and h don't. 函数f需要起始值， g和h不需要。 It should be possible to write some code to guess the parameters, basically the first one is yData[0] , the second is yData[-1] - yData[0] and the others don't matter and are just set to 1 , but I did it manually here. 应该可以编写一些代码来猜测参数，基本上第一个是yData[0] ，第二个是yData[-1] - yData[0] ，其他无关紧要，只是设置为1 ，但是我在这里手动完成了。

Both, g and h have the property that they pass ( 0, 0 ) . g和h都具有它们传递的属性( 0, 0 ) 。 Additionally, h will saturate at 100 . 另外， h将在100饱和。

Note: Sure the more parameters the better the fit, but if it is, eg, a CDF you probably want a fixed saturation value and maybe the pass through ( 0, 0 ) as well. 注意：确定参数越多，拟合越好，但如果是，例如，CDF，您可能需要固定的饱和度值，也可能需要通过( 0, 0 ) 。

Answer 2

This might be an acceptable fit to the Gunary equation, with an R-squared value of 0.976: 这可能是Gunary方程的可接受拟合，R平方值为0.976：

y = x / (a + bx + cx^0.5) y = x /（a + bx + cx ^ 0.5）

Fitting target of lowest sum of squared absolute error = 2.4509677507601545E+01

a =  1.2327255760994933E-03
b =  1.0083740273268828E-02
c =  1.9179200839782879E-03

Answer 3

R package drc has many options. R package drc有很多选择。

Here is a 5-parameter log-logistic model, which yields residuals lower than the fits in the question. 这是一个5参数的对数逻辑模型，它产生的残差低于问题中的拟合。

BONUS: It has a self-starter function, so you avoid the challenge of finding initial values for non-linear regression. 奖励：它具有自启动功能，因此您可以避免找到非线性回归的初始值。

library(drc)
dosefit <- drm(y ~ x, fct = LL2.5())

改善曲线拟合R中的数据

问题描述

3 个解决方案

解决方案1
3 2018-01-23 15:15:21

解决方案2
3 2018-01-24 12:34:52

解决方案3
1 2018-01-23 20:42:56

改善曲线拟合R中的数据

问题描述

3 个解决方案

解决方案1 3 2018-01-23 15:15:21

解决方案2 3 2018-01-24 12:34:52

解决方案3 1 2018-01-23 20:42:56

解决方案1
3 2018-01-23 15:15:21

解决方案2
3 2018-01-24 12:34:52

解决方案3
1 2018-01-23 20:42:56