简体   繁体   English

将高斯拟合到一维蒙版数据数组

[英]Fitting a Gaussian to a a 1D masked data array

I have a masked 1D data array which contains nan values in it which I have masked and now print as --. 我有一个遮罩的一维数据数组,其中包含我遮罩的nan值,现在打印为-。 I wish to fit this array to a Gaussian and create a histogram using the mean and standard deviation of the fit. 我希望将此数组拟合为高斯,并使用拟合的均值和标准差创建直方图。 I've tried spicy.stats.fit but that didn't work (mean and std just returned 'nan'). 我已经尝试了辣.stats.fit,但没有用(平均值和标准差返​​回“ nan”)。 I then sought after spicy.mstats but it doesn't look like it has a fit function. 然后,我寻求了Spice.mstats,但它看起来不具有健身功能。

Is there a module that can fit a Gaussian to a masked array and output a mean and std? 是否有一个模块可以将高斯拟合到掩码数组并输出均值和标准差?

EDIT: Here is my code thusfar 编辑:这是到目前为止我的代码

def createRmsMatrix( self ):

    '''
    Creates an array of RMS values for each profile in one file.
    '''

    # Initialize RMS table of zeroes
    rmsMatrix = np.zeros( ( self.nSub, self.nChan ), dtype = float )

    # Loop over the time and frequency indices
    for time in np.arange( self.nSub ):
        for frequency in np.arange( self.nChan ):

            # Create a mask along the bin space
            mask = utils.binMask( self.data[time][frequency], 0.55 )

            #print(mask)

            rmsMatrix[time][frequency] = mu.rootMeanSquare( self.data[time][frequency][mask == 0] )

    # Mask the nan values in the array
    rmsMatrix = np.ma.array( rmsMatrix, mask = np.isnan( rmsMatrix ) )

    print( "Root Mean Square matrix created..." )

    return rmsMatrix

And the part of my main function that calls this is: 我的主要功能调用此部分是:

    # Return the array of RMS values for each profile
    self.rmsArray = self.createRmsMatrix()

    # Reshape RMS array to be linear and store in a new RMS array
    self.linearRmsArray = np.reshape( self.rmsArray, ( self.nChan * self.nSub ) )

    # Best fit of data using a Gaussian fit
    mu, sigma = norm.fit( self.linearRmsArray )

    # Creates the histogram
    n, bins, patches = self.histogramPlot( self.linearRmsArray, mu, sigma, 'Root Mean Squared', 'Frequency Density', True )

The histogramPlot is just a convenient matplotlib organizer for me which I will also post: histogramPlot对我来说只是一个方便的matplotlib组织者,我还将发布该文章:

def histogramPlot( self, data, mean, stdDev, xAxis='x-axis', yAxis='y-axis', showPlot = False ):

    '''
    Plots and returns a histogram of some linear data using matplotlib
    and fits a Gaussian centered around the mean with a spread of stdDev.
    Use this function to set the x and y axis names.
    Can also toggle showing of the histogram in this function.
    '''

    # Plot the histogram
    n, bins, patches = plt.hist( self.linearRmsArray, bins=self.nChan, normed=True )

    # Add a 'best fit' normal distribution line
    xPlot = np.linspace( ( mean - (4*stdDev) ), ( mean + (4*stdDev) ), 1000 )
    yPlot = mlab.normpdf( xPlot, mean, stdDev )
    l = plt.plot(xPlot, yPlot, 'r--', linewidth=2)

    # Format axes
    plt.ylabel( yAxis )
    plt.xlabel( xAxis )
    #plt.title(r'$\mathrm{Histogram\ of\ data:}\ \mu=%.3f,\ \sigma=%.3f$' %(mu, sigma))
    plt.title(r'$\mu=%.3f,\ \sigma=%.3f$' %(mean, stdDev))
    plt.grid(True)

    if showPlot == True:
        plt.show()

    return n, bins, patches

You were attempting to use scipy.norm.fit to fit a normal distribution to your data, which implies that your input is collection of values that is supposed to be a random sample from a normal distribution. 您试图使用scipy.norm.fit将正态分布拟合到数据中,这意味着您的输入是值的集合,该值应该是来自正态分布的随机样本。 In that case, the maximum likelihood estimates of the mean and std. 在这种情况下,均值和标准差的最大似然估计。 dev. 开发。 are simply the sample mean and sample standard deviation of the data. 只是数据的样本均值和样本标准差。 For data that contains nan , you could remove the nan s before calling scipy.norm.fit() , or you can compute these directly with numpy.nanmean and numpy.nanstd : 对于包含数据nan ,你可以删除nan调用S前scipy.norm.fit()或者你可以直接计算这些numpy.nanmeannumpy.nanstd

est_mean = np.nanmean(data)
est_stddev = np.nanstd(data)

For example, 例如,

In [18]: import numpy as np

In [19]: from scipy.stats import norm

In [20]: x = np.array([1, 4.5, np.nan, 3.3, 10.0, 4.1, 8.5, 17.1, np.nan])

In [21]: np.nanmean(x), np.nanstd(x)
Out[21]: (6.9285714285714288, 5.0366412520687653)

In [22]: norm.fit(x[np.isfinite(x)])
Out[22]: (6.9285714285714288, 5.0366412520687653)

Note that x[np.isfinite(x)] is the array of values in x that are not nan or inf . 请注意, x[np.isfinite(x)]是值的阵列x不在naninf

If you have a masked array, you can use the mean and std methods: 如果您有被遮罩的数组,则可以使用meanstd方法:

In [36]: mx = np.ma.masked_array(x, np.isnan(x))

In [37]: mx
Out[37]: 
masked_array(data = [1.0 4.5 -- 3.3 10.0 4.1 8.5 17.1 --],
             mask = [False False  True False False False False False  True],
       fill_value = 1e+20)

In [38]: mx.mean(), mx.std()
Out[38]: (6.9285714285714288, 5.0366412520687653)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM