简体   繁体   中英

Python - Multipeak fit - histogram

I would like to fit my data with a sum of gaussian functions, but the program does not converge. I do not know if it is a problem of the code or of data.

#My function: sum of two gaussian
def gauss2(x, *p):
    A, mu, sigma, A1, mu1, sigma1 = p                                    
    return (A / (math.sqrt(2 * math.pi) * sigma)) * np.exp(- (x - mu) ** 2 / (2. * sigma ** 2)) + (A1 / (math.sqrt(2 * math.pi) * sigma1)) * np.exp(- (x - mu1) ** 2 / (2. * sigma1 ** 2))

#Histogram
hist, bin_edges = np.histogram(data, density=True)
#I consider the center of each column of the histogram for the fit
bin_centres = (bin_edges[:-1] + bin_edges[1:]) / 2
#Guess
p0 = [2., 50.,0.05, 2., 52.,1.]
#Fit using curve_fit
coeff, var_matrix = curve_fit(gauss2, bin_centres, hist, p0=p0)

#For the plot
xx = []
ss = -14
prova2 = []
for i in range(10000):
    ss += 0.01
    xx.append(ss)

hist_fit = gauss2(xx, *coeff)
plt.plot(xx, hist_fit, 'b')

The result of the fit is:

1:[  1.45724361e+05   3.14206364e+03  -2.95328767e+02   8.89521631e-01
   5.20036421e+01   5.79493687e-01]!

My data would peak around 50.5 and 52.

Are there different procedure than 'curve-fit' for fitting a function?

Here is a sketch (pseudocode, not real code) of the EM algorithm. You don't need the histogram at all.

function M_step (x, responsibility, j)
  bump_mean[j] = sum (x[j]*responsibility[i, j], j, 1, n)
    where n = length(x)
  bump_mean_x2[j] = sum (x[j]**2 * responsibility[i, j], j, 1, n)
  bump_variance[j] = bump_mean_x2[j] - bump_mean[j]**2
  mixing_proportion[j] = sum (responsibility[i, j], j, 1, n)

function E_step (x, means, variances, mixing weights)
  responsibility[i, j] = p(bump i | x[j])
    for each bump i and datum x[j]

function EM (x)
  for many times:
    call E_step for data x and current parameter estimates
      to obtain responsibility values
    call M_step with responsibility values for each bump
      to update parameters

I've omitted a lot of details, and I'm working from memory so there could be mistakes. But the summary is this: E-step = estimate responsibility of each bump for each datum, then M-step = estimate bump parameters and mixing weights given responsibility. The M-step is just the same as computing the mean and variance using weighted data.

I solved my problem minimizing the negative log likelihood function, as in the following pseudo code:

#Gaussian function
def mygauss(x, *p):
    mu, sigma = p                                    
    return (1 / (math.sqrt(2 * math.pi) * sigma))  * np.exp(- (x - mu) ** 2 / (2. * sigma ** 2))

#Model to calculate the likelihood
def pdf_model(x, p):
    mu1, sig1= p
    return mygauss(x, mu1, sig1)

#Negative Log likelohood function
def log_likelihood_two_1d_gauss(p, sample):
    h = 0
    for x in sample:
        h = h + math.log(pdf_model(x, p))    
    return -h


from scipy.optimize import minimize
#Guess    
p0 = np.array([a,   b])
#My data
mydata = mydata

res = minimize(log_likelihood_two_1d_gauss, x0 = p0, args = (mydata,), method='nelder-mead')
print res.success
print res.message
print res.x

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM