I am now using the package fitdistrplus
to construct a Gamma distribution and my question is how can I extract the fitted values in order to calculate the root mean squared error? Thanks for any help.
library(fitdistrplus)
Sev = c(1.42,5.15,2.5,2.29,12.36,2.82,1.4,3.53,1.17,1.0,4.03,5.26,1.65,1.41,3.75,1.09,
3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,
4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,
1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)
fg <- fitdist(data = Sev, distr = "gamma", method = "mle")
This is not a regression context, there are no clear fitted values here. What you may have in mind is estimated density values f(Sev; theta), where theta are the estimates given by fg
. That would be
fit <- dgamma(Sev, fg$estimate[1], fg$estimate[2])
and it is a meaningful and well-defined object. However, you get into trouble when trying to compute RMSE: what is going to be the thing that you will compare fit
with? What is your sample density value at 1.42? Since you are dealing with a continuous distribution, you are going to have to use some kernel estimator, which again has a parameter - bandwidth! A very crude thing would be
den <- density(Sev)
sqrt(mean((den$y - dgamma(den$x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.0146867
That's RMSE between the MLE estimate given by fg
and the kernel density estimate den
. Using the np
package you could estimate the density better than with density.
There is something more sensible that you can do: comparing the empirical CDF of your data and the CDF given by fg
. The former is given by empCDF <- ecdf(Sev)
and the latter by pgamma
with the corresponding parameter values. Then, for instance, the Kolmogorov-Smirnov statistic would be approximately
x <- seq(min(Sev), max(Sev), length = 10000)
max(abs(empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2])))
# [1] 0.1725476
and a kind of RMSE would be
sqrt(mean((empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.04585509
(Both statistics can be made more precise with optim
and integrate
, respectively).
To sum up, since it's not a regression context, things are different and, depending on how rigorous you want to be, there are many alternatives to explore.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.