简体   繁体   English

SciPy + Numpy:找到S形曲线的斜率

[英]SciPy + Numpy: Finding the slope of a sigmoid curve

I have some data that follow a sigmoid distribution as you can see in the following image: 如下面的图像所示,我有一些数据呈S型分布: 2003年的S型数据

After normalizing and scaling my data, I have adjusted the curve at the bottom using scipy.optimize.curve_fit and some initial parameters: 标准化和缩放数据后,我使用scipy.optimize.curve_fit和一些初始参数调整了底部的曲线:

popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0 = [0.05, 0.05, 0.05])
>>> print popt
[  2.82019932e+02  -1.90996563e-01   5.00000000e-02]

So popt , according to the documentation , returns *"Optimal values for the parameters so that the sum of the squared error of f(xdata, popt) - ydata is minimized" . 因此,根据文档popt返回*“参数的最佳值,以使f(xdata, popt)-ydata的平方误差之和最小” I understand here that there is no calculation of the slope with curve_fit , because I do not think the slope of this gentle curve is 282, neither is negative. 我在这里了解到没有使用curve_fit来计算斜率,因为我认为该缓和曲线的斜率是282,也不是负数。

Then I tried with scipy.optimize.leastsq , because the documentation says it returns "The solution (or the result of the last iteration for an unsuccessful call). ", so I thought the slope would be returned. 然后,我尝试使用scipy.optimize.leastsq ,因为文档说它返回“解决方案(或调用失败的最后一次迭代的结果)。 ”,因此我认为将返回斜率。 Like this: 像这样:

p, cov, infodict, mesg, ier = leastsq(residuals, p_guess, args = (nxdata, nydata), full_output=True)
>>> print p
Param(x0=281.73193626250207, y0=-0.012731420027056234, c=1.0069006606656596, k=0.18836680131910222)

But again, I did not get what I expected. 但是同样,我没有达到我的期望。 curve_fit and leastsq returned almost the same values, with is not surprising I guess, as curve_fit is using an implementation of the least squares method within to find the curve. curve_fitleastsq返回几乎相同的值,我猜不足为奇,因为curve_fit使用内部最小二乘法的实现来找到曲线。 But no slope back...unless I overlooked something. 但是没有后退...除非我忽略了一些东西。

So, how to calculate the slope in a point, say, where X = 285 and Y = 0.5? 那么, 如何计算某个点的斜率,例如X = 285,Y = 0.5?

I am trying to avoid manual methods, like calculating the derivative in, say, (285.5, 0.55) and (284.5, 0.45) and subtract and divide results and so. 我试图避免使用人工方法,例如计算 (285.5,0.55)和(284.5,0.45)中的导数 ,然后减去和除以结果。 I would like to know if there is a more automatic method for this. 我想知道是否有更自动的方法。

Thank you all! 谢谢你们!

EDIT #1 编辑#1

This is my "sigmoid_function", used by curve_fit and leastsq methods: 这是我的“ sigmoid_function”,由curve_fit和minimumsq方法使用:

def sigmoid_function(xdata, x0, k, p0): # p0 not used anymore, only its components (x0, k)
    # This function is called by two different methods: curve_fit and leastsq,
    # this last one through function "residuals". I don't know if it makes sense
    # to use a single function for two (somewhat similar) methods, but there 
    # it goes.

    # p0:
    #   + Is the initial parameter for scipy.optimize.curve_fit. 
    #   + For residuals calculation is left empty
    #   + It is initialized to [0.05, 0.05, 0.05]
    # x0:
    #   + Is the convergence parameter in X-axis and also the shift
    #   + It starts with 0.05 and ends up being around ~282 (days in a year)
    # k:
    #   + Set up either by curve_fit or leastsq
    #   + In least squares it is initially fixed at 0.5 and in curve_fit
    #   + to 0.05. Why? Just did this approach in two different ways and 
    #   + it seems it is working. 
    #   + But honestly, I have no clue on what it represents
    # xdata: 
    #   + Positions in X-axis. In this case from 240 to 365

# Finally I changed those parameters as suggested in the answer. 
# Sigmoid curve has 2 degrees of freedom, therefore, the initial 
# guess only needs to be this size. In this case, p0 = [282, 0.5]


    y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
    return y

def residuals(p_guess, xdata, ydata):
    # For the residuals calculation, there is no need of setting up the initial parameters
    # After fixing the initial guess and sigmoid_function header, remove [] 
    # return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])
    return ydata - sigmoid_function(xdata, p_guess[0], p_guess[1], [])

I am sorry if I made mistakes while describing the parameters or confused technical terms. 如果在描述参数或混淆技术术语时出错,我感到抱歉。 I am very new with numpy and I have not studied maths for years, so I am catching up again. 我对numpy很陌生,而且我已经好多年没有学习数学了,所以我又开始追赶。

So, again, what is your advice to calculate the slope of X = 285, Y = 0.5 (more or less the midpoint) for this dataset? 那么,对于该数据集,您如何建议计算X = 285,Y = 0.5(或多或少的中点)的斜率呢? Thanks!! 谢谢!!

EDIT #2 编辑#2

Thanks to Oliver W., I updated my code as he suggested and understood a bit better the problem. 感谢Oliver W.,我按照他的建议更新了代码,并更好地理解了这个问题。

There is a final detail I do not fully get. 我还没有完全了解最后一个细节。 Apparently, curve_fit returns a popt array (x0, k) with the optimum parameters for the fitting: 显然, curve_fit返回一个popt数组(x0,k),带有用于拟合的最佳参数:

  • x0 seems to be how shifted is the curve by indicating the central point of the curve x0似乎是通过指示曲线的中心点来改变曲线的方向
  • k parameter is the slope when y = 0.5, also in the center of the curve (I think!) k参数是y = 0.5时的斜率,也位于曲线的中心(我认为!)

Why if the sigmoid function is a growing one, the derivative/slope in popt is negative? 如果S型函数是一个增长的函数,为什么popt中的导数/斜率是负数? Does it make sense? 是否有意义?

I used sigmoid_derivative to calculate the slope and, yes, I obtained the same results that popt but with positive sign. 我使用sigmoid_derivative计算斜率,是的,我获得了与popt相同的结果,但带有正号。

# Year 2003, 2005, 2007. Slope in midpoint.
k = [-0.1910, -0.2545, -0.2259] # Values coming from popt
slope = [0.1910, 0.2545, 0.2259] # Values coming from sigmoid_derivative function

I know this is being a bit peaky because I could use both. 我知道这有点尖锐,因为我可以同时使用两者。 The relevant data is in there but with negative sign, but I was wondering why is this happening. 相关数据在里面,但带有负号,但我想知道为什么会这样。

So, the calculation of the derivative function as you suggested, is only required if I need to know the slope in other points than y = 0.5 . 因此,仅当我需要知道y = 0.5以外的其他点的斜率时,才需要计算您建议的导数函数。 Only for midpoint, I can use popt . 仅对于中点,我可以使用popt

Thanks for your help, it saved me a lot of time. 感谢您的帮助,它节省了我很多时间。 :-) :-)

You're never using the parameter p0 you're passing to your sigmoid function. 您永远不会使用要传递给S型函数的参数p0 Hence, curve fitting will not have any good measure to find convergence, because it can take any value for this parameter. 因此,曲线拟合将没有任何好的方法来找到收敛性,因为它可以为该参数取任何值。 You should first rewrite your sigmoid function like this: 您应该首先像这样重写Sigmoid函数:

def sigmoid_function(xdata, x0, k):

    y = np.exp(-k*(xdata-x0)) / (1 + np.exp(-k*(xdata-x0)))
    return y

This means your model (the sigmoid) has only two degrees of freedom. 这意味着您的模型(S型)只有两个自由度。 This will be returned in popt : 这将在popt返回:

initial_guess = [282, 1]  # (x0, k): at x0, the sigmoid reaches 50%, k is slope related
popt, pcov = curve_fit(sigmoid_function, xdata, ydata, p0=initial_guess)

Now popt will be a tuple (or array of 2 values), being the best possible x0 and k . 现在popt将是一个元组(或2个值的数组),是可能的最佳x0k

To get the slope of this function at any point, to be honest, I would just calculate the derivative symbolically as the sigmoid is not such a hard function. 老实说,要在任何点获得该函数的斜率,我只能用符号方式计算导数,因为S形不是那么难的函数。 You will end up with: 您最终将得到:

def sigmoid_derivative(x, x0, k):
    f = np.exp(-k*(x-x0))
    return -k / f

If you have the results from your curve fitting stored in popt , you could pass this easily to this function: 如果将曲线拟合的结果存储在popt ,则可以轻松地将其传递给此函数:

print(sigmoid_derivative(285, *popt))

which will return for you the derivative at x=285 . 这将为您返回x=285的导数。 But, because you ask specifically for the midpoint, so when x==x0 and y==.5 , you'll see (from the sigmoid_derivative) that the derivative there is just -k , which can be observed immediately from the curve_fit output you've already obtained. 但是,因为您专门要求中点,所以当x==x0y==.5 ,您会看到(从sigmoid_derivative中得出)导数仅为-k ,可以从curve_fit输出中立即观察到您已经获得了。 In the output you've shown, that's about 0.19. 在显示的输出中,大约为0.19。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM