python 使用協方差最小的高斯混合模型（GMM）擬合加權數據

Question

我想使用 python 將高斯混合模型擬合到一組加權數據點。

我試過 sklearn.mixture.GMM() ，它工作正常，除了它對所有數據點加權相等。 有誰知道在這種方法中為數據點分配權重的方法？ 我多次嘗試使用數據點來“增加它們的權重”，但這對於大型數據集似乎無效。

我也考慮過自己實現 EM 算法，但這似乎比上面的 GMM 方法慢得多，並且會極大地增加大型數據集的計算時間。

我剛剛發現了 EM 算法 cv2.EM() 的 opencv 方法。 這再次工作正常，但與 sklearn.mixture.GMM 存在相同的問題，此外，似乎無法更改協方差允許的最小值。 或者有沒有辦法將協方差最小值更改為例如 0.001？ 我希望可以使用探針參數為數據分配權重，但這似乎只是一個輸出參數，對擬合過程沒有影響，不是嗎？ 使用 probs0 並通過使用 trainM 以 M 步啟動算法也無濟於事。 對於 probs0，我使用了（數據點數）x（GMM 分量數）矩陣，其列相同，而數據點的權重參數寫入與數據點對應的行。 這也沒有解決問題。 它只是產生了一個混合模型，其中所有的意思都是 0。

有沒有人知道如何操作上述方法，或者有沒有人知道另一種方法，以便 GMM 可以擬合加權數據？

謝謝，簡

Answer 1

如果您仍在尋找解決方案，pomegranate 現在支持在加權數據上訓練 GMM。 您需要做的就是在訓練時傳入一個權重向量，它會為您處理。 這是關於石榴中 GMM 的簡短教程！

父 github 在這里：

https://github.com/jmschrei/pomegranate

具體教程在這里：

https://github.com/jmschrei/pomegranate/blob/master/tutorials/B_Model_Tutorial_2_General_Mixture_Models.ipynb

Answer 2

根據 Jacobs 的建議，我編寫了一個 pomegranate 實現示例：

import pomegranate
import numpy
import sklearn
import sklearn.datasets 

#-------------------------------------------------------------------------------
#Get data from somewhere (moons data is nice for examples)
Xmoon, ymoon = sklearn.datasets.make_moons(200, shuffle = False, noise=.05, random_state=0)
Moon1 = Xmoon[:100] 
Moon2 = Xmoon[100:] 
MoonsDataSet = Xmoon

#Weight the data from moon2 much higher than moon1:
MoonWeights = numpy.array([numpy.ones(100), numpy.ones(100)*10]).flatten()

#Make the GMM model using pomegranate
model = pomegranate.gmm.GeneralMixtureModel.from_samples(
    pomegranate.MultivariateGaussianDistribution,   #Either single function, or list of functions
    n_components=6,     #Required if single function passed as first arg
    X=MoonsDataSet,     #data format: each row is a point-coordinate, each column is a dimension
    )

#Force the model to train again, using additional fitting parameters
model.fit(
    X=MoonsDataSet,         #data format: each row is a coordinate, each column is a dimension
    weights = MoonWeights,  #List of weights. One for each point-coordinate
    stop_threshold = .001,  #Lower this value to get better fit but take longer. 
                            #   (sklearn likes better/slower fits than pomegrante by default)
    )

#Wrap the model object into a probability density python function 
#   f(x_vector)
def GaussianMixtureModelFunction(Point):
    return model.probability(numpy.atleast_2d( numpy.array(Point) ))

#Plug in a single point to the mixture model and get back a value:
ExampleProbability = GaussianMixtureModelFunction( numpy.array([ 0,0 ]) )
print ('ExampleProbability', ExampleProbability)

python 使用協方差最小的高斯混合模型（GMM）擬合加權數據

問題描述

2 個解決方案

解決方案1
1 2016-07-28 02:48:11

解決方案2
1 2020-06-14 00:25:52

python 使用協方差最小的高斯混合模型（GMM）擬合加權數據

問題描述

2 個解決方案

解決方案1 1 2016-07-28 02:48:11

解決方案2 1 2020-06-14 00:25:52

解決方案1
1 2016-07-28 02:48:11

解決方案2
1 2020-06-14 00:25:52