简体   繁体   中英

predicting from flexmix object (R)

I fit some data to a mixture distribution of two gaussian in flexmix :

data("NPreg", package = "flexmix")
mod <- flexmix(yn ~ x, data = NPreg, k = 2,
           model = list(FLXMRglm(yn ~ x, family= "gaussian"),
                        FLXMRglm(yn ~ x, family = "gaussian")))

the model fit is as follows:

> mod

Call:
flexmix(formula = yn ~ x, data = NPreg, k = 2, model =    list(FLXMRglm(yn ~ x, family = "gaussian"), 
    FLXMRglm(yn ~ x, family = "gaussian")))

Cluster sizes:
  1   2 
 74 126 

convergence after 31 iterations

But how do I actually predict from this model?

when I do

pred <- predict(mod, NPreg)

I get a list with the predictions from each of the two components

To get a single prediction, do I have to add in the cluster sizes like this?

single <- (74/200)* pred$Comp.1[,1] + (126/200)*pred$Comp.2[,2]

I use flexmix for prediction in the following way:

pred = predict(mod, NPreg)
clust = clusters(mod,NPreg)
result = cbind(NPreg,data.frame(pred),data.frame(clust))
plot(result$yn,col = c("red","blue")[result$clust],pch = 16,ylab = "yn")

NPreg中的簇

And the confusion matrix:

table(result$class,result$clust)

NPreg的混淆矩阵

For getting the predicted values of yn , I select the component value of the cluster to which a data point belongs.

for(i in 1:nrow(result)){
  result$pred_model1[i] = result[,paste0("Comp.",result$clust[i],".1")][i]
  result$pred_model2[i] = result[,paste0("Comp.",result$clust[i],".2")][i]
}

The actual vs predicted results show the fit (adding only one of them here as both of your models are same, you would use pred_model2 for the second model).

qplot(result$yn, result$pred_model1,xlab="Actual",ylab="Predicted") + geom_abline()

实际与预测

RMSE = sqrt(mean((result$yn-result$pred_model1)^2))

gives a root mean square error of 5.54 .

This answer is based on many SO answers I read through while working with flexmix . It worked well for my problem.

You may also be interested in visualizing the two distributions. My model was the following, which shows some overlap as the ratio of components are not close to 1 .

Call:
flexmix(formula = yn ~ x, data = NPreg, k = 2, 
model = list(FLXMRglm(yn ~ x, family = "gaussian"), 
             FLXMRglm(yn ~ x, family = "gaussian")))

       prior size post>0 ratio
Comp.1 0.481  102    129 0.791
Comp.2 0.519   98    171 0.573

'log Lik.' -1312.127 (df=13)
AIC: 2650.255   BIC: 2693.133 

I also generate a density distribution with histograms to visulaize both components. This was inspired by a SO answer from the maintainer of betareg .

a = subset(result, clust == 1)
b = subset(result, clust == 2)
hist(a$yn, col = hcl(0, 50, 80), main = "",xlab = "", freq = FALSE, ylim = c(0,0.06))
hist(b$yn, col = hcl(240, 50, 80), add = TRUE,main = "", xlab = "", freq = FALSE, ylim = c(0,0.06))
ys = seq(0, 50, by = 0.1)
lines(ys, dnorm(ys, mean = mean(a$yn), sd = sd(a$yn)), col = hcl(0, 80, 50), lwd = 2)
lines(ys, dnorm(ys, mean = mean(b$yn), sd = sd(b$yn)), col = hcl(240, 80, 50), lwd = 2)

组件密度

# Joint Histogram
p <- prior(mod)
hist(result$yn, freq = FALSE,main = "", xlab = "",ylim = c(0,0.06))
lines(ys, p[1] * dnorm(ys, mean = mean(a$yn), sd = sd(a$yn)) +
        p[2] * dnorm(ys, mean = mean(b$yn), sd = sd(b$yn)))

在此处输入图片说明

您可以将其他参数传递给您的预测调用。

pred <- predict(mod, NPreg, aggregate = TRUE)[[1]][,1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM