简体   繁体   English

拟合R中的多峰分布; 从拟合分布中生成新值

[英]Fitting multimodal distributions in R; generating new values from fitted distribution

I am working with small sample size data: 我正在使用小样本数据:

>dput(dat.demand2050.unique)  
c(79, 56, 69, 61, 53, 73, 72, 86, 75, 68, 74.2, 80, 65.6, 60, 54)    

for which the density distribution looks like this: 其密度分布如下:
pdf数据

I know that the values are from two regimes - low and high - and assuming that the underlying process is normal, I used the mixtools package to fit a bimodal distribution: 我知道这些值来自两个方案 - 低和高 - 并假设基础过程是正常的,我使用mixtools包来适应双峰分布:

set.seed(99)  
dat.demand2050.mixmdl <- normalmixEM(dat.demand2050.unique, lambda=c(0.3,0.7), mu=c(60,70), k=2)

which gives me the following result: 这给了我以下结果:
在此输入图像描述
(the solid lines are fitted curves and the dashed line is the original density). (实线是拟合曲线,虚线是原始密度)。

# get the parameters of the mixture
dat.demand2050.mixmdl.prop <- dat.demand2050.mixmdl$lambda    #mix proportions
dat.demand2050.mixmdl.means <- dat.demand2050.mixmdl$mu    #modal means
dat.demand2050.mixmdl.dev <- dat.demand2050.mixmdl$sigma   #modal std dev  

The mixture parameters are: 混合参数是:

>dat.demand2050.mixmdl.prop  #mix proportions  
[1] 0.2783939 0.7216061  
>dat.demand2050.mixmdl.means  #modal means  
[1] 56.21150 73.08389  
>dat.demand2050.mixmdl.dev  #modal std dev  
[1] 3.098292 6.413906 

I have the following questions: 我有以下问题:

  1. To generate a new set of values that approximates the underlying distribution, is my approach correct or is there a better workflow? 要生成一组近似于基础分布的新值,我的方法是正确的还是有更好的工作流程?
  2. If my approach is correct, how can I use this result to generate a set of random values from this mixed distribution? 如果我的方法是正确的,我如何使用此结果从此混合分布生成一组随机值?

Your sample size is a bit dubious to be fitting mixtures, but never mind that. 您的样本量对于拟合混合物有点不确定,但不要紧。 You can sample from the fitted mixture as follows: 您可以按照以下方式从装配的混合物中取样:

probs <- dat.demand2050.mixmdl$lambda
m <- dat.demand2050.mixmdl$mu
s <- at.demand2050.mixmdl$sigma

N <- 1e5
grp <- sample(length(probs), N, replace=TRUE, prob=probs)
x <- rnorm(N, m[grp], s[grp])

Your approach is correct. 你的方法是正确的。

For each sample from your mixed distribution you just need to choose which of the two component Gaussian distributions the sample should come from and then draw the sample from that distribution. 对于混合分布中的每个样本,您只需选择样本应来自的两个分量高斯分布中的哪一个,然后从该分布中绘制样本。

You can choose between the two distributions using the mixture proportions you have found: simulate a random number between 0 and 1 and sample from the first distribution if it the random number is less than the first proportion, otherwise sample from the second distribution. 您可以使用找到的混合比例在两个分布之间进行选择:模拟0到1之间的随机数,如果随机数小于第一个比例,则从第一个分布中取样,否则从第二个分布中取样。

Finally, sample from the relevant Gaussian distribution using the rnorm function. 最后,使用rnorm函数从相关高斯分布中抽样。

dat.demand2050.mixmdl.prop=c(0.2783939,0.7216061)
dat.demand2050.mixmdl.means=c(56.21150,73.08389)
dat.demand2050.mixmdl.dev=c(3.098292,6.413906)

sampleMixture=function(prop,means,dev){
    # Generate a uniformly distributed random number between 0 and 1
    # in order to choose between the two component distributions
    distTest=runif(1)
    if(distTest<prop[1]){
        # Then sample from the first component of the mixture
        sample=rnorm(1,mean=means[1],sd=dev[1])
    }else{
        # Sample from the second component of the mixture
        sample=rnorm(1,mean=means[2],sd=dev[2])
    }
    return(sample)
}

# Generate a single sample
sampleMixture(dat.demand2050.mixmdl.prop,dat.demand2050.mixmdl.means,dat.demand2050.mixmdl.dev)

# Generate 100 samples and plot resulting distribution
samples=replicate(100,sampleMixture(dat.demand2050.mixmdl.prop,dat.demand2050.mixmdl.means,dat.demand2050.mixmdl.dev))
plot(density(samples))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM