简体   繁体   English

高斯和伽马分布的混合

[英]Mixture of Gaussian and Gamma distribution

I'm looking for some script/package in R (Python will do too) to find out the component distribution parameters from a mixture of Gaussian and Gamma distributions. 我正在寻找R中的一些脚本/包(Python也会这样做),以从高斯和Gamma分布的混合中找出组件分布参数。 I've so far used the R package "mixtools" to model the data as mixture of Gaussians, but I think it can be better modeled by Gamma plus Gaussian. 到目前为止,我已经使用R包“mixtools”将数据建模为高斯混合,但我认为它可以通过Gamma加高斯来更好地建模。

Thanks 谢谢

Here's one possibility: 这是一种可能性:

Define utility functions: 定义效用函数:

rnormgammamix <- function(n,shape,rate,mean,sd,prob) {
    ifelse(runif(n)<prob,
           rgamma(n,shape,rate),
           rnorm(n,mean,sd))
}

(This could be made a little bit more efficient ...) (这可以提高一点......)

dnormgammamix <- function(x,shape,rate,mean,sd,prob,log=FALSE) {
    r <- prob*dgamma(x,shape,rate)+(1-prob)*dnorm(x,mean,sd)
    if (log) log(r) else r
}

Generate fake data: 生成虚假数据:

set.seed(101)
r <- rnormgammamix(1000,1.5,2,3,2,0.5)
d <- data.frame(r)

Approach #1: bbmle package. 方法#1: bbmle包。 Fit shape, rate, standard deviation on log scale, prob on logit scale. 拟合对数刻度的形状,速率,标准偏差,对数刻度的概率。

library("bbmle")
m1 <- mle2(r~dnormgammamix(exp(logshape),exp(lograte),mean,exp(logsd),
                     plogis(logitprob)),
     data=d,
     start=list(logshape=0,lograte=0,mean=0,logsd=0,logitprob=0))
cc <- coef(m1)

png("normgam.png")
par(bty="l",las=1)
hist(r,breaks=100,col="gray",freq=FALSE)
rvec <- seq(-2,8,length=101)
pred <- with(as.list(cc),
             dnormgammamix(rvec,exp(logshape),exp(lograte),mean,
                           exp(logsd),plogis(logitprob)))
lines(rvec,pred,col=2,lwd=2)
true <- dnormgammamix(rvec,1.5,2,3,2,0.5)
lines(rvec,true,col=4,lwd=2)
dev.off()

在此输入图像描述

tcc <- with(as.list(cc),
            c(shape=exp(logshape),
              rate=exp(lograte),
              mean=mean,
              sd=exp(logsd),
              prob=plogis(logitprob)))
cbind(tcc,c(1.5,2,3,2,0.5))

The fit is reasonable, but the parameters are fairly far off -- I think this model isn't very strongly identifiable in this parameter regime (ie, the Gamma and gaussian components can be swapped) 拟合是合理的,但参数相当远 - 我认为这个模型在这个参数范围内不是很强烈可识别(即,可以交换Gamma和高斯分量)

library("MASS")
ff <- fitdistr(r,dnormgammamix,
     start=list(shape=1,rate=1,mean=0,sd=1,prob=0.5))

cbind(tcc,ff$estimate,c(1.5,2,3,2,0.5))

fitdistr gets the same result as mle2 , which suggests we're in a local minimum. fitdistr得到相同的结果mle2 ,这表明我们在一个局部最小值。 If we start from the true parameters we get to something reasonable and near the true parameters. 如果我们从真实参数开始,我们得到一些合理的并接近真实参数。

ff2 <- fitdistr(r,dnormgammamix,
     start=list(shape=1.5,rate=2,mean=3,sd=2,prob=0.5))
-logLik(ff2)  ## 1725.994
-logLik(ff)   ## 1755.458

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM