将负二项式拟合为零截断（仅正数）数据

Question

我刚刚了解到MASS::fitdistr ，在拟合负二项式时，对零的数量很敏感......因为我希望拟合这个分布来计算零数量未知的物种的数据而我' d争辩不可知。 我的目标是在仅给定分布的正（非零）部分的情况下拟合负二项式......并相信在模拟数据上，它将返回（大约）模拟参数值。 我并不致力于使用MASS::fitdistr 。 感谢您的任何建议。

# function to fit neg binomial to abundances of species at the per-site level

nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial"
               , lower=c(1e-9, 1e-9))}

# simulate an abundance vector
set.seed(100)
site_abundance<-rnbinom(667, size = 0.4, mu = 30)

# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters

# fit again with zeros omitted
nbpar(site_abundance[site_abundance>0]) # Oh Snap, gives nonsense... at least in the sense that the estimated parameters are pretty far off from the inputs!

更新：对我来说，如果拟合算法返回（以相当准确、低偏差的方式）仅基于截断（严格正数）数据的拟合完整数据（包括 0）的参数，则拟合算法将起作用。 为什么MASS::fitdistr不是我想要的扩展示例：

# first few lines are example as above
##########
##########
library(ggplot2)
# function to fit neg binomial to abundances of species at the per-site level

nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial"
               , lower=c(1e-9, 1e-9))}

trunc<-function(x){x[x>0]}

# simulate an abundance vector
set.seed(100)

# slightly more abstract than first example

trials<-667
size = 0.4
mu = 30

site_abundance <-rnbinom(n = trials, size = size, mu = mu)



# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters

# fit again with zeros omitted
x<-nbpar(site_abundance[site_abundance>0]) # different parameters

##############
##############
# new stuff
# I suspected the parameters drift upwards 
# if I do this iteratively, and a quick test 
# showed I was right

drift <- data.frame()
for(driftSteps in c(1:40)){
  mypar <- nbpar(trunc(site_abundance))
  size <- mypar$estimate[[1]]
  mu <- mypar$estimate[[2]]
  site_abundance <- rnbinom(n = trials, size = size, mu = mu) 
  drift[driftSteps,"driftSteps"]<- driftSteps
  drift[driftSteps,"size"]<- size
  drift[driftSteps,"mu"]<- mu
  
}


drift %>% ggplot(aes(driftSteps, mu)) +
  geom_point() +
  theme_classic()

Answer 1

这是作为答案发布的，因为它太长而无法放入评论中。
我认为参数估计没有问题，它们非常适合数据，如下图所示。
另请注意，零代表 19% 的数据，没有它们，参数估计必须与数据生成过程中使用的不同。

# function to fit neg binomial to abundances of 
# species at the per-site level
nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial", lower=c(1e-9, 1e-9))
}

# simulate an abundance vector
set.seed(100)
site_abundance <- rnbinom(667, size = 0.4, mu = 30)

# fit with zeros omitted
pars <- nbpar(site_abundance[site_abundance > 0])

mean(site_abundance == 0)
#> [1] 0.1904048

empiric_dens <- proportions(table(site_abundance[site_abundance > 0]))
barplot(empiric_dens)
curve(dnbinom(x, size = pars$estimate[1], mu = pars$estimate[2]), 
      from = 0, to = 300, col = "red", lwd = 2, add = TRUE)

^{由reprex 包于 2022-06-17 创建 (v2.0.1)}

将负二项式拟合为零截断（仅正数）数据

问题描述

1 个解决方案

解决方案1
0 2022-06-17 18:36:59

将负二项式拟合为零截断（仅正数）数据

问题描述

1 个解决方案

解决方案1 0 2022-06-17 18:36:59

解决方案1
0 2022-06-17 18:36:59