简体   繁体   中英

How to fit frequency distributions in R?

Is there a function that could be used to fit a frequency distribution in R? I'm aware of fitdistr but as far as I can tell it only works for data vectors (random samples). Also, I know that converting between the two formats is trivial but frequencies are so large that memory is a concern.

For example, fitdistr may be used the following way:

x<-rpois(100, lambda=10)
fitdistr(x,"poisson")

Is there a function that would do the same fitting on a frequency table? Something along the lines:

freqt <- as.data.frame(table(x))
fitfreqtable(freqt$x, weights=freqt$Freq, "poisson")

Thanks!

There's no built-in function that I know of for fitting a distribution to a frequency table. Note that, in theory, a continuous distribution is inappropriate for a table, since the data is discrete. Of course, for large enough N and a fine enough grid, this can be ignored.

You can build your own model-fitting function using optim or any other optimizer, if you know the density that you're interested in. I did this here for a gamma distribution (which was a bad assumption for that particular dataset, but never mind that).

Code reproduced below.

negll <- function(par, x, y)
{
    shape <- par[1]
    rate <- par[2]
    mu <- dgamma(x, shape, rate) * sum(y)
    -2 * sum(dpois(y, mu, log=TRUE))
}


optim(c(1, 1), negll, x=seq_along(g$count), y=g$count, method="L-BFGS-B", lower=c(.001, .001))
$par
[1] 0.73034879 0.00698288

$value
[1] 62983.18

$counts
function gradient 
      32       32 

$convergence
[1] 0

$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

For fitting a Poisson distribution, you only need the mean of your sample. Then the mean equals the lambda, which is the only parameter of the Poisson distribution. Example:

set.seed(1111)
sample<-rpois(n=10000,l=10)
mean(sample)
[1] 10.0191

which is almost equal to the lambda value put for creating the sample (l=10). The small difference (0.0191) is due to the randomness of the Poisson distribution random value generator. As you increase n the difference will get smaller. Alternatively, you can fit the distribution using an optimization method:

library(fitdistrplus)
fitdist(sample,"pois")
set.seed(1111)

Fitting of the distribution ' pois ' by maximum likelihood 
Parameters:
       estimate Std. Error
lambda  10.0191 0.03165296

but it's only a waste of time. For theoritical information on fitting frequency data, you can see my answer here .

The function fixtmixturegrouped from the package ForestFit does the job for other distribution models using frequency-by-group data.

It can fit simple or mixture distribution models based on "gamma", "log-normal", "skew-normal", and "weibull".

For a Poisson distribution, the population mean is the only parameter that is needed. Applying a simple summary function on your data would suffice (as suggested by ntzortzis)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM