简体   繁体   English

如何拟合散点 plot 数据的正态分布

[英]How to fit a normal distribution for scatter plot data

I have a dataframe with the x (column x) and y (column 1) values below I am getting the mean and stdev .我有一个 dataframe ,其 x (第 x 列)和 y (第 1 列)值低于我得到meanstdev

Next I am plotting them together on one chart, but it just looks very wrong, It is not just that the fitted curve is shifted, I am not sure what is wrong with it.接下来我将它们一起绘制在一张图表上,但它看起来非常错误,不仅仅是拟合曲线移动了,我不确定它有什么问题。

import matplotlib.pyplot as plt
from scipy import stats
from scipy import optimize
import numpy as np

data_sample = {'x': [0,1,2,3,4,5,6,7,8,9,10], '1': [0,1,2,3,4,5,4,3,2,1,0]}  
def test_func(x, a, b): 
    return stats.norm.pdf(x,a,b)

params, cov_params = optimize.curve_fit(test_func, data_sample['x'], data_sample['1'])

print(params)

plt.scatter(data_sample['x'], data_sample['1'], label='Data')
plt.plot(data_sample['x'] , test_func(data_sample['x'], params[0], params[1]), label='Fitted function')

plt.legend(loc='best')

plt.show()

在此处输入图像描述

The data needs to be normalized such that the area under the curve is 1. To calculate the area, when all x-values are 1 apart, you need the sum of the y-values.需要对数据进行归一化,使曲线下的面积为 1。要计算面积,当所有 x 值相差 1 时,您需要 y 值的总和 If the space between the x-values is larger or smaller than 1, that factor should also be included.如果 x 值之间的空间大于或小于 1,则还应包括该因子。 Another way to calculate the area is np.trapz() .另一种计算面积的方法是np.trapz()

The normalization factor needs to be used when doing the fit.进行拟合时需要使用归一化因子。 And the reverse needs to happen when drawing the curve with the original data.使用原始数据绘制曲线时需要发生相反的情况。

When you try to fit the Gaussian pdf function to non-normalized points, the "best" fit is a very narrow, very high peak.当您尝试将高斯 pdf function 拟合到非归一化点时,“最佳”拟合是一个非常窄、非常高的峰值。 This peak tries to approach the y=5 value in the center.这个峰值试图接近中心的y=5值。

The example code below converts the lists to numpy arrays, so functions can be written more easily.下面的示例代码将列表转换为 numpy arrays,因此可以更轻松地编写函数。 Also, to draw a smooth curve, more detailed x-values are used.此外,为了绘制平滑曲线,使用更详细的 x 值。

import matplotlib.pyplot as plt
from scipy import stats
from scipy import optimize
import numpy as np

def test_func(x, a, b):
    return stats.norm.pdf(x, a, b)

data_sample = {'x': np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]),
               '1': np.array([0, 1, 2, 3, 4, 5, 4, 3, 2, 1, 0])}

# x_dist = (data_sample['x'].max() - data_sample['x'].min()) / (len(data_sample['x']) - 1)
# normalization_factor = sum(data_sample['1']) * x_dist
normalization_factor = np.trapz(data_sample['1'], data_sample['x'])  # area under the curve
params, pcov = optimize.curve_fit(test_func, data_sample['x'], data_sample['1'] / normalization_factor)

plt.scatter(data_sample['x'], data_sample['1'], clip_on=False, label='Data')
x_detailed = np.linspace(data_sample['x'].min() - 3, data_sample['x'].max() + 3, 200)
plt.plot(x_detailed, test_func(x_detailed, params[0], params[1]) * normalization_factor,
         color='crimson', label='Fitted function')

plt.legend(loc='best')
plt.margins(x=0)
plt.ylim(ymin=0)
plt.tight_layout()
plt.show()

将正态曲线拟合到某些点

PS: Using the original code (without the normalization), but with more detailed x values, the narrow curve would be more apparent: PS:使用原始代码(没有归一化),但使用更详细的 x 值,窄曲线会更明显:

x_detailed = np.linspace(min(data_sample['x']) - 1, max(data_sample['x']) + 1, 500)
plt.plot(x_detailed, test_func(x_detailed, params[0], params[1]), color='m', label='Fitted function')

非归一化数据的窄高斯曲线

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM