用 scipy curve_fit() 拟合带有背景的信号

Question

下面是我当前问题的玩具 model。 我有一个宽高斯形状的背景和一个与背景平均值略微偏离中心的尖锐信号峰。 我想提取尖峰的属性（即宽度、峰位置等）。 以下是 plot 的链接，其与峰值拟合失败：

背景+偏移信号图 .

对于参数的初始猜测，奇怪的是，当使用比分布的实际标准差大得多的值时，拟合效果更好......有些地方是错误的，但无法弄清楚。 我将不胜感激有关将峰与背景拟合的任何提示。

以下是我尝试过的。

#Fake Data
data  = np.random.normal(loc=3.25, scale=0.01, size=15000)
data2 = np.random.normal(loc=3.0, scale=0.3, size=25000)

#Bins
bins  = np.arange(0, 6.1, 0.1)

#Hitogram with its defined bins
data_entries_1, bins = np.histogram(data, bins=bins)
data_entries_2, bins = np.histogram(data2, bins=bins)

#Add two generated histograms - Final y data
data_entries = data_entries_1 + data_entries_2

#Cetner of each bins - Final x data
bin_centers = np.array([0.5*(bins[i] + bins[i+1]) for i in range(len(bins)-1)])

#fit func 1
def fit_func1(x, A, mu, sigma):
    #Define functions here
    first_func  = A*np.exp(-1.0*(x - mu)**2 / (2*sigma**2))
    return first_func

#fit func 2
def fit_func2(x, B, mu2, sigma2):
    #Define functions here
    second_func = B*np.exp(-1.0*(x - mu2)**2 / (2*sigma2**2))
    return second_func

#total fit function
def fit_func(x, A, mu, sigma, B, mu2, sigma2):
    #Define functions here
    first_func  = A*np.exp(-1.0*(x - mu)**2 / (2*sigma**2))
    second_func = B*np.exp(-1.0*(x - mu2)**2 / (2*sigma2**2))
    final_func  = first_func + second_func
    return final_func

#Fit it
popt1, pconv1 = curve_fit(fit_func1, xdata=bin_centers, ydata=data_entries_1, p0=[20000, 3.25, 1.])
popt2, pconv2 = curve_fit(fit_func2, xdata=bin_centers, ydata=data_entries_2, p0=[2000, 3.0, 0.3])
popt, pconv   = curve_fit(fit_func, xdata=bin_centers, ydata=data_entries, p0=[20000, 3.25, 1.,\
                                                                           2000, 3.0, 0.3])

# Generate enough x values to make the curves look smooth.
xspace = np.linspace(0, 6, 100)

# Plot the histogram and the fitted function.
plt.step(bin_centers, data_entries, label=r'Histogram entries')
plt.plot(xspace, fit_func1(xspace, *popt1), label='Fitted function1')
plt.plot(xspace, fit_func2(xspace, *popt2), label='Fitted function2')
plt.plot(xspace, fit_func(xspace, *popt), label='Fitted function', linestyle="--")
plt.xlim(1, 5)
plt.legend()
plt.show()

更新根据所选答案的建议，将 bin 宽度减小到小于要拟合的实际峰的 sigma。 此外，为了减少要拟合的自由参数的数量，拟合高斯 function 的 mu 固定为常数（分布的实际平均值）。

#Generate Fake Data
data  = np.random.normal(loc=3.25, scale=0.04, size=15000)
data2 = np.random.normal(loc=3.0, scale=0.3, size=25000)

#Bins
bins  = np.arange(0, 6.1, 0.02)

#Compute mean to use as contraints when fitting
mids = np.array([0.5*(bins[i] + bins[i+1]) for i in range(len(bins)-1)])
mean_1 = np.average(mids, weights=data_entries_1)
mean_2 = np.average(mids, weights=data_entries_2)

#fit func 1
def fit_func1(x, A, sigma):
    #Define functions here
    first_func  = A*np.exp(-1.0*(x - mean_1)**2 / (2*sigma**2))
    return first_func

#fit func 2
def fit_func2(x, B, sigma2):
    #Define functions here
    second_func = B*np.exp(-1.0*(x - mean_2)**2 / (2*sigma2**2))
    return second_func

#total fit function
def fit_func(x, A, sigma, B, sigma2):
    #Define functions here
    first_func  = A*np.exp(-1.0*(x - mean_1)**2 / (2*sigma**2))
    second_func = B*np.exp(-1.0*(x - mean_2)**2 / (2*sigma2**2))
    final_func  = first_func + second_func
    return final_func

Answer 1

有几个问题。

plt.step function 没有按照您的想法执行。 它采用垃圾箱的边缘，而不是垃圾箱的中心。
rarrow 峰是一个正态分布，其 sigma 远小于 bin 宽度。 本质上，您尝试将三参数 function (A, mu, sigma) 拟合到单个 (x, y) 值上； 这注定会失败。 这种拟合的结果在我的系统上无法重现； 根据随机生成器 output，重新运行相同的代码有时甚至会产生错误。

有关这些点的说明，请参见下文。

def normal(x, sigma):
    a = 0.5/sigma**2
    return np.sqrt(a/np.pi) * np.exp(-a*x**2)
    
def truefunc(x):
    return 1500*normal(x-3.25, 0.01) + 2500*normal(x-3.0, 0.3)

plt.close('all')
xspace = np.linspace(3, 3.5, 200)
plt.plot(bin_centers, data_entries, 'ko', label=r'Histogram entries')
plt.plot(xspace, truefunc(xspace), label='True distribution')
plt.plot(xspace, fit_func1(xspace, *popt1), label='Fitted function1')
plt.plot(xspace, fit_func2(xspace, *popt2), label='Fitted function2')
plt.plot(xspace, fit_func(xspace, *popt), label='Fitted function', linestyle="--")
plt.xlim(3, 3.5)
plt.legend()
plt.show()

因此，您需要减小 bin 大小以便实际解析窄峰的宽度，或者重新定义fit_func1以采用两个参数：峰高和峰 position (mu)。 将 sigma 修复为bin_width/6 。 您必须对拟合结果进行后处理，以使峰值下的面积与直方图一致。

如果您减小 bin 大小，直方图会很嘈杂。 使用curve_fit的sigma参数可能是个好主意； 将其设置为

np.sqrt(data_entries + 0.25)

这在统计上并不完全正确，但比假设所有直方图条目都存在固定错误要好得多。 （您可以使用 0.25 值；它应该 > 0 且 < 1）。

Answer 2

如果您的目标是分析峰特性，您可以在没有拟合的情况下使用 go。 要获得峰值 position，只需执行以下操作：

peak_pos = bin_centers[data_entries.argmax()]

如果您有多个这样的峰值，您也可以使用scipy.signal.find_peaks 。

要获得峰宽，您可以使用scipy.signal.peak_widths 。 或者，您可以在分析峰值之前拟合背景宽高斯并减去。

用 scipy curve_fit() 拟合带有背景的信号

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-06-30 07:34:09

解决方案2
1 2020-06-30 07:50:03

用 scipy curve_fit() 拟合带有背景的信号

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-06-30 07:34:09

解决方案2 1 2020-06-30 07:50:03

解决方案1
2 已采纳 2020-06-30 07:34:09

解决方案2
1 2020-06-30 07:50:03