Is there a function that could be used to fit a frequency distribution in R? I'm aware of fitdistr
but as far as I can tell it only works for data vectors (random samples). Also, I know that converting between the two formats is trivial but frequencies are so large that memory is a concern.
For example, fitdistr
may be used the following way:
x<-rpois(100, lambda=10)
fitdistr(x,"poisson")
Is there a function that would do the same fitting on a frequency table? Something along the lines:
freqt <- as.data.frame(table(x))
fitfreqtable(freqt$x, weights=freqt$Freq, "poisson")
Thanks!
There's no built-in function that I know of for fitting a distribution to a frequency table. Note that, in theory, a continuous distribution is inappropriate for a table, since the data is discrete. Of course, for large enough N and a fine enough grid, this can be ignored.
You can build your own model-fitting function using optim
or any other optimizer, if you know the density that you're interested in. I did this here for a gamma distribution (which was a bad assumption for that particular dataset, but never mind that).
Code reproduced below.
negll <- function(par, x, y)
{
shape <- par[1]
rate <- par[2]
mu <- dgamma(x, shape, rate) * sum(y)
-2 * sum(dpois(y, mu, log=TRUE))
}
optim(c(1, 1), negll, x=seq_along(g$count), y=g$count, method="L-BFGS-B", lower=c(.001, .001))
$par
[1] 0.73034879 0.00698288
$value
[1] 62983.18
$counts
function gradient
32 32
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
For fitting a Poisson distribution, you only need the mean of your sample. Then the mean equals the lambda, which is the only parameter of the Poisson distribution. Example:
set.seed(1111)
sample<-rpois(n=10000,l=10)
mean(sample)
[1] 10.0191
which is almost equal to the lambda value put for creating the sample (l=10). The small difference (0.0191) is due to the randomness of the Poisson distribution random value generator. As you increase n
the difference will get smaller. Alternatively, you can fit the distribution using an optimization method:
library(fitdistrplus)
fitdist(sample,"pois")
set.seed(1111)
Fitting of the distribution ' pois ' by maximum likelihood
Parameters:
estimate Std. Error
lambda 10.0191 0.03165296
but it's only a waste of time. For theoritical information on fitting frequency data, you can see my answer here .
The function fixtmixturegrouped
from the package ForestFit
does the job for other distribution models using frequency-by-group data.
It can fit simple or mixture distribution models based on "gamma", "log-normal", "skew-normal", and "weibull".
For a Poisson distribution, the population mean is the only parameter that is needed. Applying a simple summary function on your data would suffice (as suggested by ntzortzis)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.