简体   繁体   English

如何在python中使用MLE拟合双指数分布?

[英]How to fit double exponential distribution using MLE in python?

I am trying to fit a double-exponential (ie mixture of two exponential or bi-exp) data using MLE. 我试图使用MLE拟合双指数(即两个指数或双指数的混合)数据。 Though there is no direct example of such a problem, yet I found some hint of using MLE for linear ( Maximum Likelihood Estimate pseudocode ), sigmoidal ( https://stats.stackexchange.com/questions/66199/maximum-likelihood-curve-model-fitting-in-python ) and normal ( Scipy MLE fit of a normal distribution ) distribution fitting. 虽然没有这种问题的直接例子,但我发现了使用MLE进行线性( 最大似然估计伪代码 )的一些提示,sigmoidal( https://stats.stackexchange.com/questions/66199/maximum-likelihood-curve- model-fitting-in-python )和normal( Scipy MLE拟合正态分布 )分布拟合。 Using these examples I have tested the following code: 使用这些示例我测试了以下代码:

import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
import scipy.stats as stats

size = 300

def simu_dt():
    ## simulate Exp2 data
    np.random.seed(0)
    ## generate random values between 0 to 1
    x = np.random.rand(size)
    data = []
    for n in x:
        if n < 0.6:
            # generating 1st exp data
            data.append(np.random.exponential(scale=20)) # t1
        else:
            # generating 2nd exp data
            data.append(np.random.exponential(scale=500)) # t2
    return np.array(data)

ydata2 = simu_dt() # call to generate simulated data
## trimming the data at the beginning and the end a bit
ydata2 = ydata2[np.where(2 < ydata2)]
ydata2 = ydata2[np.where(ydata2 < 3000)]

## creating the normalized log binned histogram data
bins = 10 ** np.linspace(np.log10(np.min(ydata2)), np.log10(np.max(ydata2)), 10)
counts, bin_edges = np.histogram(ydata2, bins=bins)
bin_centres = (bin_edges[:-1] + bin_edges[1:]) / 2
bin_width = (bin_edges[1:] - bin_edges[:-1])
counts = counts / bin_width / np.sum(counts)

## generating arbitrary x value
x1 = np.linspace(bin_centres.min(), bin_centres.max(), len(ydata2))

def MLE(params):
    """ find the max likelihood """
    a1, k1, k2, sd = params
    yPred = (1-a1)*k1*np.exp(-k1*x1) + a1*k2*np.exp(-k2*x1)
    negLL = -np.sum(stats.norm.pdf(ydata2, loc=yPred, scale=sd))
    return negLL

guess = np.array([0.4, 1/30, 1/320, 0.2])
bnds = ((0, 0.9), (1/200, 1/2), (1/1000, 1/100), (0, 1))
## best function used for global fitting

results = optimize.minimize(MLE, guess, method='SLSQP', bounds=bnds)

print(results)
A1, K1, K2, _ = results.x
y_fitted = (1-A1)*K1*np.exp(-K1*x1) + A1*K2*np.exp(-K2*x1)

## plot actual data
plt.plot(bin_centres, counts, 'ko', label=" actual data")
plt.xlabel("Dwell Times (s)")
plt.ylabel("Probability")

## plot fitted data on original data
plt.plot(x1, y_fitted, c='r', linestyle='dashed', label="fit")
plt.legend()
plt.xscale('log')
plt.yscale('log')

plt.show()

The fit summary shows: 拟合摘要显示:

Out:
 fun: -1.7494005752178573e-16
     jac: array([-3.24161825e-18,  0.00000000e+00,  4.07105635e-16, -6.38053319e-14])
 message: 'Optimization terminated successfully.'
    nfev: 6
     nit: 1
    njev: 1
  status: 0
 success: True
       x: array([0.4       , 0.03333333, 0.003125  , 0.2       ])

This is the plot showing the fit . 这是显示拟合的图 Though the fit seems working the result returns the guesses I have provided! 虽然合适似乎有效,但结果会返回我提供的猜测! Also, if I change the guesses the fit is also changing, meaning it probably not converging at all. 此外,如果我改变猜测,拟合也在变化,这意味着它可能根本不会收敛。 I am not sure what wrong I am doing. 我不确定我在做什么错。 Just to say I am not an expert in Python and math as well. 只是说我不是Python和数学方面的专家。 So, any help is highly appreciated. 所以,任何帮助都非常感谢。 Thanks in Advance. 提前致谢。

There are several places where I would say you made mistakes. 有几个地方我会说你犯了错误。 For example, you are passing x1 (equidistant x-values) instead of ydata2 directly. 例如,您直接传递x1(等距x值)而不是ydata2。 Then, you are using an inappropriate negativeLL, as you are supposed to take the negative log of your own probabilities under the assumption of some parameters. 然后,你正在使用一个不合适的negativeLL,因为你应该在一些参数的假设下采取你自己的概率的负对数。 Thus, your fourth parameter is unnecessary.Your function should be: 因此,您的第四个参数是不必要的。您的函数应该是:

def MLE(params):
    """ find the max likelihood """
    a1, k1, k2 = params
    yPred = (1-a1)*k1*np.exp(-k1*ydata2) + a1*k2*np.exp(-k2*ydata2)
    negLL = -np.sum(np.log(yPred))
    return negLL

The code still fails to converge for numerical reasons (scales badly), and some suggestions of linearization might help. 由于数值原因(严重程度),代码仍然无法收敛,并且线性化的一些建议可能会有所帮助。 What you can easily do is change your optimization method to eg L-BFGS-B and the method should converge properly. 您可以轻松做到的是将优化方法更改为例如L-BFGS-B,并且方法应该正确收敛。

在此输入图像描述

Full code: 完整代码:

import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
import scipy.stats as stats

size = 300000
nbins = 30

def simu_dt():
    ## simulate Exp2 data
    np.random.seed(20)
    ## generate random values between 0 to 1
    x = np.random.rand(size)
    data = []
    for n in x:
        if n < 0.6:
            # generating 1st exp data
            data.append(np.random.exponential(scale=20)) # t1
        else:
            # generating 2nd exp data
            data.append(np.random.exponential(scale=500)) # t2
    return np.array(data)

ydata2 = simu_dt() # call to generate simulated data
## trimming the data at the beginning and the end a bit
ydata2 = ydata2[np.where(2 < ydata2)]
ydata2 = ydata2[np.where(ydata2 < 3000)]

## creating the normalized log binned histogram data
bins = 10 ** np.linspace(np.log10(np.min(ydata2)), np.log10(np.max(ydata2)), nbins)
counts, bin_edges = np.histogram(ydata2, bins=bins)
bin_centres = (bin_edges[:-1] + bin_edges[1:]) / 2
bin_width = (bin_edges[1:] - bin_edges[:-1])
counts = counts / bin_width / np.sum(counts)

## generating arbitrary x value
x1 = np.linspace(bin_centres.min(), bin_centres.max(), len(ydata2))

def MLE(params):
    """ find the max likelihood """
    k1, k2 = params
    yPred = 0.6*k1*np.exp(-k1*ydata2) + 0.4*k2*np.exp(-k2*ydata2)
    negLL = -np.sum(np.log(yPred))
    return negLL

guess = np.array([1/30, 1/200])
bnds = ((1/100, 1/2), (1/1000, 1/100))
## best function used for global fitting

results = optimize.minimize(MLE, guess, bounds=bnds)

print(results)
K1, K2 = results.x
y_fitted = 0.6*K1*np.exp(-K1*x1) + 0.4*K2*np.exp(-K2*x1)

## plot actual data
plt.plot(bin_centres, counts, 'ko', label=" actual data")
plt.xlabel("Dwell Times (s)")
plt.ylabel("Probability")

## plot fitted data on original data
plt.plot(x1, y_fitted, c='r', linestyle='dashed', label="fit")
plt.legend()
plt.xscale('log')
plt.yscale('log')

plt.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM