How to find fitted values from `fitdistrplus` package in R?

Question

I am now using the package fitdistrplus to construct a Gamma distribution and my question is how can I extract the fitted values in order to calculate the root mean squared error? Thanks for any help.

library(fitdistrplus)
Sev = c(1.42,5.15,2.5,2.29,12.36,2.82,1.4,3.53,1.17,1.0,4.03,5.26,1.65,1.41,3.75,1.09,
    3.44,1.36,1.19,4.76,5.58,1.23,2.29,7.71,1.12,1.26,2.78,1.13,3.87,15.43,1.19,
    4.95,7.69,1.17,3.27,1.44,1.05,3.94,1.58,2.29,2.73,3.75,6.80,1.16,1.01,1.00,
    1.02,2.32,2.86,22.90,1.42,1.10,2.78,1.23,1.61,1.33,3.53,10.44)
fg <- fitdist(data = Sev, distr = "gamma", method = "mle")

Answer 1

This is not a regression context, there are no clear fitted values here. What you may have in mind is estimated density values f(Sev; theta), where theta are the estimates given by fg . That would be

fit <- dgamma(Sev, fg$estimate[1], fg$estimate[2])

and it is a meaningful and well-defined object. However, you get into trouble when trying to compute RMSE: what is going to be the thing that you will compare fit with? What is your sample density value at 1.42? Since you are dealing with a continuous distribution, you are going to have to use some kernel estimator, which again has a parameter - bandwidth! A very crude thing would be

den <- density(Sev)
sqrt(mean((den$y - dgamma(den$x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.0146867

That's RMSE between the MLE estimate given by fg and the kernel density estimate den . Using the np package you could estimate the density better than with density.

There is something more sensible that you can do: comparing the empirical CDF of your data and the CDF given by fg . The former is given by empCDF <- ecdf(Sev) and the latter by pgamma with the corresponding parameter values. Then, for instance, the Kolmogorov-Smirnov statistic would be approximately

x <- seq(min(Sev), max(Sev), length = 10000)
max(abs(empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2])))
# [1] 0.1725476

and a kind of RMSE would be

sqrt(mean((empCDF(x) - pgamma(x, fg$estimate[1], fg$estimate[2]))^2))
# [1] 0.04585509

(Both statistics can be made more precise with optim and integrate , respectively).

To sum up, since it's not a regression context, things are different and, depending on how rigorous you want to be, there are many alternatives to explore.

How to find fitted values from `fitdistrplus` package in R?

Question

1 answers

solution1
1 ACCPTED 2018-12-22 22:30:56

How to find fitted values from `fitdistrplus` package in R?

Question

1 answers

solution1 1 ACCPTED 2018-12-22 22:30:56

solution1
1 ACCPTED 2018-12-22 22:30:56