How can I find the right gaussian curve given some data?

Question

I have code that draws from a gaussian in 1D:

import numpy as np
from scipy.stats import norm
from scipy.optimize import curve_fit
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import gauss

# Beginning in one dimension:
mean = 0; Var = 1; N = 1000
scatter = np.random.normal(mean,np.sqrt(Var),N)
scatter = np.sort(scatter)
mu,sigma = norm.fit(scatter)

I obtain mu and sigma using norm.fit()

Now I'd like to obtain my parameters using

xdata = np.linspace(-5,5,N)
pop, pcov = curve_fit(gauss.gauss_1d,xdata,scatter)

The problem is I don't know how to map my scattered points (drawn from a 1D gaussian) to the x-line in order to use curve_fit.

Also, suppose I simply use and mu and sigma as earlier.

I plot using:

n, bins, patches = plt.hist(scatter,50,facecolor='green')
y = 2*max(n)*mlab.normpdf(bins,mu,sigma)
l = plt.plot(bins,y,'r--')

plt.xlabel('x-coord')
plt.ylabel('Occurrences')
plt.grid(True)
plt.show()

But I have to guess the amplitude as 2*max(n). It works but it's not robust. How can I find the amplitude without guessing?

Answer 1

To avoid guessing the amplitude, call hist() with normed=True , then the amplitude corresponds to normpdf() .

For doing a curve fit, I suggest to use not the density but the cumulative distribution: Each sample has a height of 1/N , which successively sum up to 1. This has the advantage that you don't need to group samples in bins.

import numpy as np
from scipy.stats import norm
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

# Beginning in one dimension:
mean = 0; Var = 1; N = 100
scatter = np.random.normal(mean,np.sqrt(Var),N)
scatter = np.sort(scatter)
mu1,sigma1 = norm.fit(scatter) # classical fit

scat_sum = np.cumsum(np.ones(scatter.shape))/N # cumulative samples
[mu2,sigma2],Cx = curve_fit(norm.cdf, scatter, scat_sum, p0=[0,1]) # curve fit
print(u"norm.fit():  µ1= {:+.4f}, σ1={:.4f}".format(mu1, sigma1))
print(u"curve_fit(): µ2= {:+.4f}, σ2={:.4f}".format(mu2, sigma2))

fg = plt.figure(1); fg.clf()
ax = fg.add_subplot(1, 1, 1)
t = np.linspace(-4,4, 1000)
ax.plot(t, norm.cdf(t, mu1, sigma1), alpha=.5, label="norm.fit()")
ax.plot(t, norm.cdf(t, mu2, sigma2), alpha=.5, label="curve_fit()")
ax.step(scatter, scat_sum, 'x-', where='post', alpha=.5, label="Samples")
ax.legend(loc="best")
ax.grid(True)
ax.set_xlabel("$x$")
ax.set_ylabel("Cumulative Probability Density")
ax.set_title("Fit to Normal Distribution")

fg.canvas.draw()
plt.show()

prints

norm.fit():  µ1= +0.1534, σ1=1.0203
curve_fit(): µ2= +0.1135, σ2=1.0444

and plots

在此处输入图片说明

How can I find the right gaussian curve given some data?

Question

1 answers

solution1
1 ACCPTED 2014-06-30 20:21:35

How can I find the right gaussian curve given some data?

Question

1 answers

solution1 1 ACCPTED 2014-06-30 20:21:35

solution1
1 ACCPTED 2014-06-30 20:21:35