简体   繁体   中英

How to find fitted values from `fitdistrplus` package in R?

I am now using the package fitdistrplus to construct a Gamma distribution and my question is how can I extract the fitted values in order to calculate the root mean squared error? Thanks for any help.

library(fitdistrplus)
Sev = c(1.42,5.15,2.5,2.29,12.36,2.82,1.4,3.53,1.17,1.0,4.03,5.26,1.65,1.41,3.75,1.09,
    3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,
    4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,
    1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)
fg <- fitdist(data = Sev, distr = "gamma", method = "mle") 

This is not a regression context, there are no clear fitted values here. What you may have in mind is estimated density values f(Sev; theta), where theta are the estimates given by fg . That would be

fit <- dgamma(Sev, fg$estimate[1], fg$estimate[2])

and it is a meaningful and well-defined object. However, you get into trouble when trying to compute RMSE: what is going to be the thing that you will compare fit with? What is your sample density value at 1.42? Since you are dealing with a continuous distribution, you are going to have to use some kernel estimator, which again has a parameter - bandwidth! A very crude thing would be

den <- density(Sev)
sqrt(mean((den$y - dgamma(den$x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.0146867

That's RMSE between the MLE estimate given by fg and the kernel density estimate den . Using the np package you could estimate the density better than with density.

There is something more sensible that you can do: comparing the empirical CDF of your data and the CDF given by fg . The former is given by empCDF <- ecdf(Sev) and the latter by pgamma with the corresponding parameter values. Then, for instance, the Kolmogorov-Smirnov statistic would be approximately

x <- seq(min(Sev), max(Sev), length = 10000)
max(abs(empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2])))
# [1] 0.1725476

and a kind of RMSE would be

sqrt(mean((empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.04585509

(Both statistics can be made more precise with optim and integrate , respectively).

To sum up, since it's not a regression context, things are different and, depending on how rigorous you want to be, there are many alternatives to explore.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM