[英]predicting from flexmix object (R)
I fit some data to a mixture distribution of two gaussian in flexmix
: 我将一些数据拟合为flexmix
中两个高斯的混合分布:
data("NPreg", package = "flexmix")
mod <- flexmix(yn ~ x, data = NPreg, k = 2,
model = list(FLXMRglm(yn ~ x, family= "gaussian"),
FLXMRglm(yn ~ x, family = "gaussian")))
the model fit is as follows: 模型拟合如下:
> mod
Call:
flexmix(formula = yn ~ x, data = NPreg, k = 2, model = list(FLXMRglm(yn ~ x, family = "gaussian"),
FLXMRglm(yn ~ x, family = "gaussian")))
Cluster sizes:
1 2
74 126
convergence after 31 iterations
But how do I actually predict from this model? 但是我实际上如何从该模型进行预测?
when I do 当我做
pred <- predict(mod, NPreg)
I get a list with the predictions from each of the two components 我得到了两个组成部分的预测清单
To get a single prediction, do I have to add in the cluster sizes like this? 要获得单个预测,是否必须添加这样的群集大小?
single <- (74/200)* pred$Comp.1[,1] + (126/200)*pred$Comp.2[,2]
I use flexmix
for prediction in the following way: 我通过以下方式使用flexmix
进行预测:
pred = predict(mod, NPreg)
clust = clusters(mod,NPreg)
result = cbind(NPreg,data.frame(pred),data.frame(clust))
plot(result$yn,col = c("red","blue")[result$clust],pch = 16,ylab = "yn")
And the confusion matrix: 和混淆矩阵:
table(result$class,result$clust)
For getting the predicted values of yn
, I select the component value of the cluster to which a data point belongs. 为了获得yn
的预测值,我选择了数据点所属的群集的组件值。
for(i in 1:nrow(result)){
result$pred_model1[i] = result[,paste0("Comp.",result$clust[i],".1")][i]
result$pred_model2[i] = result[,paste0("Comp.",result$clust[i],".2")][i]
}
The actual vs predicted results show the fit (adding only one of them here as both of your models are same, you would use pred_model2
for the second model). 实际结果与预测结果显示出拟合度(由于两个模型都相同,因此在此处仅添加其中一个,第二个模型将使用pred_model2
)。
qplot(result$yn, result$pred_model1,xlab="Actual",ylab="Predicted") + geom_abline()
RMSE = sqrt(mean((result$yn-result$pred_model1)^2))
gives a root mean square error of 5.54
. 给出5.54
均方根误差。
This answer is based on many SO answers I read through while working with flexmix
. 此答案基于我在使用flexmix
阅读的许多SO答案。 It worked well for my problem. 它很好地解决了我的问题。
You may also be interested in visualizing the two distributions. 您可能还对可视化这两个分布感兴趣。 My model was the following, which shows some overlap as the ratio of components are not close to 1
. 我的模型如下,由于组件的比率不接近1
,因此显示出一些重叠。
Call:
flexmix(formula = yn ~ x, data = NPreg, k = 2,
model = list(FLXMRglm(yn ~ x, family = "gaussian"),
FLXMRglm(yn ~ x, family = "gaussian")))
prior size post>0 ratio
Comp.1 0.481 102 129 0.791
Comp.2 0.519 98 171 0.573
'log Lik.' -1312.127 (df=13)
AIC: 2650.255 BIC: 2693.133
I also generate a density distribution with histograms to visulaize both components. 我还使用直方图生成密度分布,以对这两个分量进行可视化。 This was inspired by a SO answer from the maintainer of betareg
. 这是受betareg
维护者的SO 答案启发的。
a = subset(result, clust == 1)
b = subset(result, clust == 2)
hist(a$yn, col = hcl(0, 50, 80), main = "",xlab = "", freq = FALSE, ylim = c(0,0.06))
hist(b$yn, col = hcl(240, 50, 80), add = TRUE,main = "", xlab = "", freq = FALSE, ylim = c(0,0.06))
ys = seq(0, 50, by = 0.1)
lines(ys, dnorm(ys, mean = mean(a$yn), sd = sd(a$yn)), col = hcl(0, 80, 50), lwd = 2)
lines(ys, dnorm(ys, mean = mean(b$yn), sd = sd(b$yn)), col = hcl(240, 80, 50), lwd = 2)
# Joint Histogram
p <- prior(mod)
hist(result$yn, freq = FALSE,main = "", xlab = "",ylim = c(0,0.06))
lines(ys, p[1] * dnorm(ys, mean = mean(a$yn), sd = sd(a$yn)) +
p[2] * dnorm(ys, mean = mean(b$yn), sd = sd(b$yn)))
您可以将其他参数传递给您的预测调用。
pred <- predict(mod, NPreg, aggregate = TRUE)[[1]][,1]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.