[英]Fitting a Gaussian to a a 1D masked data array
I have a masked 1D data array which contains nan values in it which I have masked and now print as --. 我有一个遮罩的一维数据数组,其中包含我遮罩的nan值,现在打印为-。 I wish to fit this array to a Gaussian and create a histogram using the mean and standard deviation of the fit.
我希望将此数组拟合为高斯,并使用拟合的均值和标准差创建直方图。 I've tried spicy.stats.fit but that didn't work (mean and std just returned 'nan').
我已经尝试了辣.stats.fit,但没有用(平均值和标准差返回“ nan”)。 I then sought after spicy.mstats but it doesn't look like it has a fit function.
然后,我寻求了Spice.mstats,但它看起来不具有健身功能。
Is there a module that can fit a Gaussian to a masked array and output a mean and std? 是否有一个模块可以将高斯拟合到掩码数组并输出均值和标准差?
EDIT: Here is my code thusfar 编辑:这是到目前为止我的代码
def createRmsMatrix( self ):
'''
Creates an array of RMS values for each profile in one file.
'''
# Initialize RMS table of zeroes
rmsMatrix = np.zeros( ( self.nSub, self.nChan ), dtype = float )
# Loop over the time and frequency indices
for time in np.arange( self.nSub ):
for frequency in np.arange( self.nChan ):
# Create a mask along the bin space
mask = utils.binMask( self.data[time][frequency], 0.55 )
#print(mask)
rmsMatrix[time][frequency] = mu.rootMeanSquare( self.data[time][frequency][mask == 0] )
# Mask the nan values in the array
rmsMatrix = np.ma.array( rmsMatrix, mask = np.isnan( rmsMatrix ) )
print( "Root Mean Square matrix created..." )
return rmsMatrix
And the part of my main function that calls this is: 我的主要功能调用此部分是:
# Return the array of RMS values for each profile
self.rmsArray = self.createRmsMatrix()
# Reshape RMS array to be linear and store in a new RMS array
self.linearRmsArray = np.reshape( self.rmsArray, ( self.nChan * self.nSub ) )
# Best fit of data using a Gaussian fit
mu, sigma = norm.fit( self.linearRmsArray )
# Creates the histogram
n, bins, patches = self.histogramPlot( self.linearRmsArray, mu, sigma, 'Root Mean Squared', 'Frequency Density', True )
The histogramPlot is just a convenient matplotlib organizer for me which I will also post: histogramPlot对我来说只是一个方便的matplotlib组织者,我还将发布该文章:
def histogramPlot( self, data, mean, stdDev, xAxis='x-axis', yAxis='y-axis', showPlot = False ):
'''
Plots and returns a histogram of some linear data using matplotlib
and fits a Gaussian centered around the mean with a spread of stdDev.
Use this function to set the x and y axis names.
Can also toggle showing of the histogram in this function.
'''
# Plot the histogram
n, bins, patches = plt.hist( self.linearRmsArray, bins=self.nChan, normed=True )
# Add a 'best fit' normal distribution line
xPlot = np.linspace( ( mean - (4*stdDev) ), ( mean + (4*stdDev) ), 1000 )
yPlot = mlab.normpdf( xPlot, mean, stdDev )
l = plt.plot(xPlot, yPlot, 'r--', linewidth=2)
# Format axes
plt.ylabel( yAxis )
plt.xlabel( xAxis )
#plt.title(r'$\mathrm{Histogram\ of\ data:}\ \mu=%.3f,\ \sigma=%.3f$' %(mu, sigma))
plt.title(r'$\mu=%.3f,\ \sigma=%.3f$' %(mean, stdDev))
plt.grid(True)
if showPlot == True:
plt.show()
return n, bins, patches
You were attempting to use scipy.norm.fit
to fit a normal distribution to your data, which implies that your input is collection of values that is supposed to be a random sample from a normal distribution. 您试图使用
scipy.norm.fit
将正态分布拟合到数据中,这意味着您的输入是值的集合,该值应该是来自正态分布的随机样本。 In that case, the maximum likelihood estimates of the mean and std. 在这种情况下,均值和标准差的最大似然估计。 dev.
开发。 are simply the sample mean and sample standard deviation of the data.
只是数据的样本均值和样本标准差。 For data that contains
nan
, you could remove the nan
s before calling scipy.norm.fit()
, or you can compute these directly with numpy.nanmean
and numpy.nanstd
: 对于包含数据
nan
,你可以删除nan
调用S前scipy.norm.fit()
或者你可以直接计算这些numpy.nanmean
和numpy.nanstd
:
est_mean = np.nanmean(data)
est_stddev = np.nanstd(data)
For example, 例如,
In [18]: import numpy as np
In [19]: from scipy.stats import norm
In [20]: x = np.array([1, 4.5, np.nan, 3.3, 10.0, 4.1, 8.5, 17.1, np.nan])
In [21]: np.nanmean(x), np.nanstd(x)
Out[21]: (6.9285714285714288, 5.0366412520687653)
In [22]: norm.fit(x[np.isfinite(x)])
Out[22]: (6.9285714285714288, 5.0366412520687653)
Note that x[np.isfinite(x)]
is the array of values in x
that are not nan
or inf
. 请注意,
x[np.isfinite(x)]
是值的阵列x
不在nan
或inf
。
If you have a masked array, you can use the mean
and std
methods: 如果您有被遮罩的数组,则可以使用
mean
和std
方法:
In [36]: mx = np.ma.masked_array(x, np.isnan(x))
In [37]: mx
Out[37]:
masked_array(data = [1.0 4.5 -- 3.3 10.0 4.1 8.5 17.1 --],
mask = [False False True False False False False False True],
fill_value = 1e+20)
In [38]: mx.mean(), mx.std()
Out[38]: (6.9285714285714288, 5.0366412520687653)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.