繁体   English   中英

将负二项式拟合为零截断(仅正数)数据

[英]Fit negative binomial to zero-truncated (positive only) data

我刚刚了解到MASS::fitdistr ,在拟合负二项式时,对零的数量很敏感......因为我希望拟合这个分布来计算零数量未知的物种的数据而我' d争辩不可知。 我的目标是在仅给定分布的正(非零)部分的情况下拟合负二项式......并相信在模拟数据上,它将返回(大约)模拟参数值。 我并不致力于使用MASS::fitdistr 感谢您的任何建议。

# function to fit neg binomial to abundances of species at the per-site level

nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial"
               , lower=c(1e-9, 1e-9))}

# simulate an abundance vector
set.seed(100)
site_abundance<-rnbinom(667, size = 0.4, mu = 30)

# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters

# fit again with zeros omitted
nbpar(site_abundance[site_abundance>0]) # Oh Snap, gives nonsense... at least in the sense that the estimated parameters are pretty far off from the inputs!

更新:对我来说,如果拟合算法返回(以相当准确、低偏差的方式)仅基于截断(严格正数)数据的拟合完整数据(包括 0)的参数,则拟合算法将起作用。 为什么MASS::fitdistr不是我想要的扩展示例:

# first few lines are example as above
##########
##########
library(ggplot2)
# function to fit neg binomial to abundances of species at the per-site level

nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial"
               , lower=c(1e-9, 1e-9))}

trunc<-function(x){x[x>0]}

# simulate an abundance vector
set.seed(100)

# slightly more abstract than first example

trials<-667
size = 0.4
mu = 30

site_abundance <-rnbinom(n = trials, size = size, mu = mu)



# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters

# fit again with zeros omitted
x<-nbpar(site_abundance[site_abundance>0]) # different parameters

##############
##############
# new stuff
# I suspected the parameters drift upwards 
# if I do this iteratively, and a quick test 
# showed I was right

drift <- data.frame()
for(driftSteps in c(1:40)){
  mypar <- nbpar(trunc(site_abundance))
  size <- mypar$estimate[[1]]
  mu <- mypar$estimate[[2]]
  site_abundance <- rnbinom(n = trials, size = size, mu = mu) 
  drift[driftSteps,"driftSteps"]<- driftSteps
  drift[driftSteps,"size"]<- size
  drift[driftSteps,"mu"]<- mu
  
}


drift %>% ggplot(aes(driftSteps, mu)) +
  geom_point() +
  theme_classic()

这是作为答案发布的,因为它太长而无法放入评论中。
我认为参数估计没有问题,它们非常适合数据,如下图所示。
另请注意,零代表 19% 的数据,没有它们,参数估计必须与数据生成过程中使用的不同。

# function to fit neg binomial to abundances of 
# species at the per-site level
nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial", lower=c(1e-9, 1e-9))
}

# simulate an abundance vector
set.seed(100)
site_abundance <- rnbinom(667, size = 0.4, mu = 30)

# fit with zeros omitted
pars <- nbpar(site_abundance[site_abundance > 0])

mean(site_abundance == 0)
#> [1] 0.1904048

empiric_dens <- proportions(table(site_abundance[site_abundance > 0]))
barplot(empiric_dens)
curve(dnbinom(x, size = pars$estimate[1], mu = pars$estimate[2]), 
      from = 0, to = 300, col = "red", lwd = 2, add = TRUE)

reprex 包于 2022-06-17 创建 (v2.0.1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM