简体   繁体   中英

python Fitting weighted data with Gaussian mixture model (GMM) with minimum on covariance

I want to fit a Gaussian mixture model to a set of weighted data points using python.

I tried sklearn.mixture.GMM() which works fine except for the fact that it weights all data points equally. Does anyone know a way to assign weights to the data points in this method? I tried using data points several times to "increase their weight", but this seems ineffective for large datasets.

I also thought about implementing the EM algorithm myself, but this seems to be much slower than eg the GMM method above and would extremely increase the computation time for large datasets.

I just discovered the opencv method for the EM algorithm cv2.EM(). This again works fine but has the same problem as sklearn.mixture.GMM and additionally, there seems no way to change the minimum of the values allowed for the covariance. Or is there a way to change the covariance minimum to eg 0.001? I hoped that it would be possible to use the probe parameter to assign the weights to the data, but this seems to be just an output parameter and has no influence on the fitting process, doesn't it? Using probs0 and start the algorithm with the M step by using trainM didn't help either. For probs0 I used a (number of datapoint) x (number of GMM components) matrix whose columns are identical while the weighting parameters for the data points are written to the row corresponding to the data point. This didn't solve the problem either. It just resulted in a mixture model where all means where 0.

Has anyone an idea how to manipulate the methods above or does anyone know another method so that the GMM can be fitted with weighted data?

Thanks, Jane

If you're still looking for a solution, pomegranate now supports training GMM on weighted data. All you need to do is pass in a vector of weights at training time and it'll handle it for you. Here is a short tutorial on GMMs in pomegranate!

The parent github is here:

https://github.com/jmschrei/pomegranate

The specific tutorial is here:

https://github.com/jmschrei/pomegranate/blob/master/tutorials/B_Model_Tutorial_2_General_Mixture_Models.ipynb

Taking Jacobs suggestion, I coded up a pomegranate implementation example:

import pomegranate
import numpy
import sklearn
import sklearn.datasets 

#-------------------------------------------------------------------------------
#Get data from somewhere (moons data is nice for examples)
Xmoon, ymoon = sklearn.datasets.make_moons(200, shuffle = False, noise=.05, random_state=0)
Moon1 = Xmoon[:100] 
Moon2 = Xmoon[100:] 
MoonsDataSet = Xmoon

#Weight the data from moon2 much higher than moon1:
MoonWeights = numpy.array([numpy.ones(100), numpy.ones(100)*10]).flatten()

#Make the GMM model using pomegranate
model = pomegranate.gmm.GeneralMixtureModel.from_samples(
    pomegranate.MultivariateGaussianDistribution,   #Either single function, or list of functions
    n_components=6,     #Required if single function passed as first arg
    X=MoonsDataSet,     #data format: each row is a point-coordinate, each column is a dimension
    )

#Force the model to train again, using additional fitting parameters
model.fit(
    X=MoonsDataSet,         #data format: each row is a coordinate, each column is a dimension
    weights = MoonWeights,  #List of weights. One for each point-coordinate
    stop_threshold = .001,  #Lower this value to get better fit but take longer. 
                            #   (sklearn likes better/slower fits than pomegrante by default)
    )

#Wrap the model object into a probability density python function 
#   f(x_vector)
def GaussianMixtureModelFunction(Point):
    return model.probability(numpy.atleast_2d( numpy.array(Point) ))

#Plug in a single point to the mixture model and get back a value:
ExampleProbability = GaussianMixtureModelFunction( numpy.array([ 0,0 ]) )
print ('ExampleProbability', ExampleProbability)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM