简体   繁体   English

使用R难以拟合伽玛分布

[英]Difficulty fitting gamma distribution with R

I am attempting to estimate parameters for a gamma distribution fit to ecological density (ie biomass per area) data. 我试图估计符合生态密度(即每面积生物量)数据的伽马分布的参数。 I have been using the fitdistr() command from the MASS package in R (version 3.0.0 : x86_64-w64-mingw32/x64 (64-bit)). 我一直在使用R中MASS包中的fitdistr()命令(版本3.0.0:x86_64-w64-mingw32 / x64(64位))。 This is a maximum likelihood estimation command for distributional parameters. 这是分布参数的最大似然估计命令。

The vectors of data are quite large, but summary statistics are as follows: 数据向量非常大,但汇总统计数据如下:

Min. 闵。 = 0; = 0; 1st Qu. 第一曲。 = 87.67; = 87.67; Median = 199.5; 中位数= 199.5; Mean = 1255; 平均值= 1255; Variance = 2.79E+07; 差异= 2.79E + 07; 3rd Qu. 第三曲。 = 385.6; = 385.6; Max. 最大。 = 33880 = 33880

The code I am using to run the MLE procedure is: 我用来运行MLE过程的代码是:

gdist <- fitdistr(data, dgamma, 
                  start=list(shape=1, scale=1/(mean(data))),lower=c(1,0.1))

R is giving me the following error: R给我以下错误:

Error in optim(x = c(6.46791148085828, 4060.54750836902, 99.6201565968665, : non-finite finite-difference value [1] optim中的错误(x = c(6.46791148085828,4060.54750836902,99.6201565968665,:非有限的有限差分值[1]

Others who have experienced this type of issue and have turned to stackoverflow for help seem to have found the solution in adding the "lower=" argument to their code, and/or removing zeros. 其他遇到此类问题并转向stackoverflow寻求帮助的人似乎已经找到了将“lower =”参数添加到其代码中和/或删除零的解决方案。 I find that R will provide parameters for a fit if I remove the zero observations, but I was under the impression that gamma distributions covered the range 0 <= x > inf (Forbes et al. 2011. Statistical Distributions)? 如果我删除零观测值,我发现R将提供拟合参数,但我的印象是伽马分布涵盖范围0 <= x> inf(福布斯等人2011.统计分布)?

Have I gotten the wrong impression regarding the range of the gamma distribution? 我是否对伽玛分布的范围产生了错误的印象? Or is there some other issue I am missing regarding MLE (in which I am a novice). 或者是否有一些我在MLE上缺少的其他问题(我是新手)。

Getting a rough estimate by the method of moments (matching up the mean=shape*scale and variance=shape*scale^2) we have 通过矩量法得到粗略估计(匹配均值=形状*比例和方差=形状*比例^ 2)我们有

mean <- 1255
var <- 2.79e7
shape = mean^2/var   ## 0.056
scale = var/mean     ## 22231

Now generate some data from this distribution: 现在从此分发中生成一些数据:

set.seed(101)
x = rgamma(1e4,shape,scale=scale)
summary(x)
##     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     0.00      0.00      0.06   1258.00     98.26 110600.00 

MASS::fitdistr(x,"gamma")  ## error

I strongly suspect that the problem is that the underlying optim call assumes the parameters (shape and scale, or shape and rate) are of approximately the same magnitude, which they're not. 我强烈怀疑问题是底层的optim调用假定参数(形状和比例,或形状和速率)大致相同,但它们并非如此。 You can get around this by scaling your data: 您可以通过扩展数据来解决这个问题:

(m <- MASS::fitdistr(x/2e4,"gamma"))  ## works fine
##      shape           rate    
##  0.0570282411   0.9067274280 
## (0.0005855527) (0.0390939393)

fitdistr gives a rate parameter rather than a scale parameter: to get back to the shape parameter you want, invert and re-scale ... fitdistr给出一个rate参数而不是scale参数:返回你想要的shape参数,反转并重新缩放...

1/coef(m)["rate"]*2e4  ## 22057

By the way, the fact that the quantiles of the simulated data don't match your data very well (eg median of x =0.06 vs a median of 199 in your data) suggest that your data might not fit a Gamma all that well -- eg there might be some outliers affecting the mean and variance more than the quantiles? 顺便说一句,模拟数据的分位数与您的数据不匹配的事实(例如, x = 0.06的中位数与数据中的中位数199)表明您的数据可能不适合Gamma - - 例如,可能有一些异常值影响均值和方差多于分位数?

PS above I used the built-in 'gamma' estimator in fitdistr rather than using dgamma : with starting values based on the method of moments, and scaling the data by 2e4, this works (although it gives a warning about NaNs produced unless we specify lower ) 上面的PS我在fitdistr使用了内置的'gamma'估算器,而不是使用dgamma :基于矩量法的起始值,以及将数据缩放2e4,这是有效的(尽管除非我们指定,否则会发出关于NaNs produced的警告lower

 m2 <- MASS::fitdistr(x/2e4,dgamma,
        start=list(shape=0.05,scale=1), lower=c(0.01,0.01))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM