![](/img/trans.png)
[英]Fitting zero-truncated negative binomial to some datasets with VGLM fails
[英]Fit negative binomial to zero-truncated (positive only) data
我刚刚了解到MASS::fitdistr
,在拟合负二项式时,对零的数量很敏感......因为我希望拟合这个分布来计算零数量未知的物种的数据而我' d争辩不可知。 我的目标是在仅给定分布的正(非零)部分的情况下拟合负二项式......并相信在模拟数据上,它将返回(大约)模拟参数值。 我并不致力于使用MASS::fitdistr
。 感谢您的任何建议。
# function to fit neg binomial to abundances of species at the per-site level
nbpar <- function(ab){
MASS::fitdistr(ab, densfun = "Negative Binomial"
, lower=c(1e-9, 1e-9))}
# simulate an abundance vector
set.seed(100)
site_abundance<-rnbinom(667, size = 0.4, mu = 30)
# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters
# fit again with zeros omitted
nbpar(site_abundance[site_abundance>0]) # Oh Snap, gives nonsense... at least in the sense that the estimated parameters are pretty far off from the inputs!
更新:对我来说,如果拟合算法返回(以相当准确、低偏差的方式)仅基于截断(严格正数)数据的拟合完整数据(包括 0)的参数,则拟合算法将起作用。 为什么MASS::fitdistr
不是我想要的扩展示例:
# first few lines are example as above
##########
##########
library(ggplot2)
# function to fit neg binomial to abundances of species at the per-site level
nbpar <- function(ab){
MASS::fitdistr(ab, densfun = "Negative Binomial"
, lower=c(1e-9, 1e-9))}
trunc<-function(x){x[x>0]}
# simulate an abundance vector
set.seed(100)
# slightly more abstract than first example
trials<-667
size = 0.4
mu = 30
site_abundance <-rnbinom(n = trials, size = size, mu = mu)
# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters
# fit again with zeros omitted
x<-nbpar(site_abundance[site_abundance>0]) # different parameters
##############
##############
# new stuff
# I suspected the parameters drift upwards
# if I do this iteratively, and a quick test
# showed I was right
drift <- data.frame()
for(driftSteps in c(1:40)){
mypar <- nbpar(trunc(site_abundance))
size <- mypar$estimate[[1]]
mu <- mypar$estimate[[2]]
site_abundance <- rnbinom(n = trials, size = size, mu = mu)
drift[driftSteps,"driftSteps"]<- driftSteps
drift[driftSteps,"size"]<- size
drift[driftSteps,"mu"]<- mu
}
drift %>% ggplot(aes(driftSteps, mu)) +
geom_point() +
theme_classic()
这是作为答案发布的,因为它太长而无法放入评论中。
我认为参数估计没有问题,它们非常适合数据,如下图所示。
另请注意,零代表 19% 的数据,没有它们,参数估计必须与数据生成过程中使用的不同。
# function to fit neg binomial to abundances of
# species at the per-site level
nbpar <- function(ab){
MASS::fitdistr(ab, densfun = "Negative Binomial", lower=c(1e-9, 1e-9))
}
# simulate an abundance vector
set.seed(100)
site_abundance <- rnbinom(667, size = 0.4, mu = 30)
# fit with zeros omitted
pars <- nbpar(site_abundance[site_abundance > 0])
mean(site_abundance == 0)
#> [1] 0.1904048
empiric_dens <- proportions(table(site_abundance[site_abundance > 0]))
barplot(empiric_dens)
curve(dnbinom(x, size = pars$estimate[1], mu = pars$estimate[2]),
from = 0, to = 300, col = "red", lwd = 2, add = TRUE)
由reprex 包于 2022-06-17 创建 (v2.0.1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.