將負二項式擬合為零截斷（僅正數）數據

Question

我剛剛了解到MASS::fitdistr ，在擬合負二項式時，對零的數量很敏感......因為我希望擬合這個分布來計算零數量未知的物種的數據而我' d爭辯不可知。 我的目標是在僅給定分布的正（非零）部分的情況下擬合負二項式......並相信在模擬數據上，它將返回（大約）模擬參數值。 我並不致力於使用MASS::fitdistr 。 感謝您的任何建議。

# function to fit neg binomial to abundances of species at the per-site level

nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial"
               , lower=c(1e-9, 1e-9))}

# simulate an abundance vector
set.seed(100)
site_abundance<-rnbinom(667, size = 0.4, mu = 30)

# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters

# fit again with zeros omitted
nbpar(site_abundance[site_abundance>0]) # Oh Snap, gives nonsense... at least in the sense that the estimated parameters are pretty far off from the inputs!

更新：對我來說，如果擬合算法返回（以相當准確、低偏差的方式）僅基於截斷（嚴格正數）數據的擬合完整數據（包括 0）的參數，則擬合算法將起作用。 為什么MASS::fitdistr不是我想要的擴展示例：

# first few lines are example as above
##########
##########
library(ggplot2)
# function to fit neg binomial to abundances of species at the per-site level

nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial"
               , lower=c(1e-9, 1e-9))}

trunc<-function(x){x[x>0]}

# simulate an abundance vector
set.seed(100)

# slightly more abstract than first example

trials<-667
size = 0.4
mu = 30

site_abundance <-rnbinom(n = trials, size = size, mu = mu)



# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters

# fit again with zeros omitted
x<-nbpar(site_abundance[site_abundance>0]) # different parameters

##############
##############
# new stuff
# I suspected the parameters drift upwards 
# if I do this iteratively, and a quick test 
# showed I was right

drift <- data.frame()
for(driftSteps in c(1:40)){
  mypar <- nbpar(trunc(site_abundance))
  size <- mypar$estimate[[1]]
  mu <- mypar$estimate[[2]]
  site_abundance <- rnbinom(n = trials, size = size, mu = mu) 
  drift[driftSteps,"driftSteps"]<- driftSteps
  drift[driftSteps,"size"]<- size
  drift[driftSteps,"mu"]<- mu
  
}


drift %>% ggplot(aes(driftSteps, mu)) +
  geom_point() +
  theme_classic()

Answer 1

這是作為答案發布的，因為它太長而無法放入評論中。
我認為參數估計沒有問題，它們非常適合數據，如下圖所示。
另請注意，零代表 19% 的數據，沒有它們，參數估計必須與數據生成過程中使用的不同。

# function to fit neg binomial to abundances of 
# species at the per-site level
nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial", lower=c(1e-9, 1e-9))
}

# simulate an abundance vector
set.seed(100)
site_abundance <- rnbinom(667, size = 0.4, mu = 30)

# fit with zeros omitted
pars <- nbpar(site_abundance[site_abundance > 0])

mean(site_abundance == 0)
#> [1] 0.1904048

empiric_dens <- proportions(table(site_abundance[site_abundance > 0]))
barplot(empiric_dens)
curve(dnbinom(x, size = pars$estimate[1], mu = pars$estimate[2]), 
      from = 0, to = 300, col = "red", lwd = 2, add = TRUE)

^{由reprex 包於 2022-06-17 創建 (v2.0.1)}

將負二項式擬合為零截斷（僅正數）數據

問題描述

1 個解決方案

解決方案1
0 2022-06-17 18:36:59

將負二項式擬合為零截斷（僅正數）數據

問題描述

1 個解決方案

解決方案1 0 2022-06-17 18:36:59

解決方案1
0 2022-06-17 18:36:59