I would like to fit my data with a sum of gaussian functions, but the program does not converge. I do not know if it is a problem of the code or of data.
#My function: sum of two gaussian
def gauss2(x, *p):
A, mu, sigma, A1, mu1, sigma1 = p
return (A / (math.sqrt(2 * math.pi) * sigma)) * np.exp(- (x - mu) ** 2 / (2. * sigma ** 2)) + (A1 / (math.sqrt(2 * math.pi) * sigma1)) * np.exp(- (x - mu1) ** 2 / (2. * sigma1 ** 2))
#Histogram
hist, bin_edges = np.histogram(data, density=True)
#I consider the center of each column of the histogram for the fit
bin_centres = (bin_edges[:-1] + bin_edges[1:]) / 2
#Guess
p0 = [2., 50.,0.05, 2., 52.,1.]
#Fit using curve_fit
coeff, var_matrix = curve_fit(gauss2, bin_centres, hist, p0=p0)
#For the plot
xx = []
ss = -14
prova2 = []
for i in range(10000):
ss += 0.01
xx.append(ss)
hist_fit = gauss2(xx, *coeff)
plt.plot(xx, hist_fit, 'b')
The result of the fit is:
1:[ 1.45724361e+05 3.14206364e+03 -2.95328767e+02 8.89521631e-01
5.20036421e+01 5.79493687e-01]!
My data would peak around 50.5 and 52.
Are there different procedure than 'curve-fit' for fitting a function?
Here is a sketch (pseudocode, not real code) of the EM algorithm. You don't need the histogram at all.
function M_step (x, responsibility, j)
bump_mean[j] = sum (x[j]*responsibility[i, j], j, 1, n)
where n = length(x)
bump_mean_x2[j] = sum (x[j]**2 * responsibility[i, j], j, 1, n)
bump_variance[j] = bump_mean_x2[j] - bump_mean[j]**2
mixing_proportion[j] = sum (responsibility[i, j], j, 1, n)
function E_step (x, means, variances, mixing weights)
responsibility[i, j] = p(bump i | x[j])
for each bump i and datum x[j]
function EM (x)
for many times:
call E_step for data x and current parameter estimates
to obtain responsibility values
call M_step with responsibility values for each bump
to update parameters
I've omitted a lot of details, and I'm working from memory so there could be mistakes. But the summary is this: E-step = estimate responsibility of each bump for each datum, then M-step = estimate bump parameters and mixing weights given responsibility. The M-step is just the same as computing the mean and variance using weighted data.
I solved my problem minimizing the negative log likelihood function, as in the following pseudo code:
#Gaussian function
def mygauss(x, *p):
mu, sigma = p
return (1 / (math.sqrt(2 * math.pi) * sigma)) * np.exp(- (x - mu) ** 2 / (2. * sigma ** 2))
#Model to calculate the likelihood
def pdf_model(x, p):
mu1, sig1= p
return mygauss(x, mu1, sig1)
#Negative Log likelohood function
def log_likelihood_two_1d_gauss(p, sample):
h = 0
for x in sample:
h = h + math.log(pdf_model(x, p))
return -h
from scipy.optimize import minimize
#Guess
p0 = np.array([a, b])
#My data
mydata = mydata
res = minimize(log_likelihood_two_1d_gauss, x0 = p0, args = (mydata,), method='nelder-mead')
print res.success
print res.message
print res.x
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.