簡體   English   中英

將負二項式擬合為零截斷(僅正數)數據

[英]Fit negative binomial to zero-truncated (positive only) data

我剛剛了解到MASS::fitdistr ,在擬合負二項式時,對零的數量很敏感......因為我希望擬合這個分布來計算零數量未知的物種的數據而我' d爭辯不可知。 我的目標是在僅給定分布的正(非零)部分的情況下擬合負二項式......並相信在模擬數據上,它將返回(大約)模擬參數值。 我並不致力於使用MASS::fitdistr 感謝您的任何建議。

# function to fit neg binomial to abundances of species at the per-site level

nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial"
               , lower=c(1e-9, 1e-9))}

# simulate an abundance vector
set.seed(100)
site_abundance<-rnbinom(667, size = 0.4, mu = 30)

# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters

# fit again with zeros omitted
nbpar(site_abundance[site_abundance>0]) # Oh Snap, gives nonsense... at least in the sense that the estimated parameters are pretty far off from the inputs!

更新:對我來說,如果擬合算法返回(以相當准確、低偏差的方式)僅基於截斷(嚴格正數)數據的擬合完整數據(包括 0)的參數,則擬合算法將起作用。 為什么MASS::fitdistr不是我想要的擴展示例:

# first few lines are example as above
##########
##########
library(ggplot2)
# function to fit neg binomial to abundances of species at the per-site level

nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial"
               , lower=c(1e-9, 1e-9))}

trunc<-function(x){x[x>0]}

# simulate an abundance vector
set.seed(100)

# slightly more abstract than first example

trials<-667
size = 0.4
mu = 30

site_abundance <-rnbinom(n = trials, size = size, mu = mu)



# fit the distribution and get simulation parameters back out
nbpar(site_abundance) # returns something very close to simulated parameters

# fit again with zeros omitted
x<-nbpar(site_abundance[site_abundance>0]) # different parameters

##############
##############
# new stuff
# I suspected the parameters drift upwards 
# if I do this iteratively, and a quick test 
# showed I was right

drift <- data.frame()
for(driftSteps in c(1:40)){
  mypar <- nbpar(trunc(site_abundance))
  size <- mypar$estimate[[1]]
  mu <- mypar$estimate[[2]]
  site_abundance <- rnbinom(n = trials, size = size, mu = mu) 
  drift[driftSteps,"driftSteps"]<- driftSteps
  drift[driftSteps,"size"]<- size
  drift[driftSteps,"mu"]<- mu
  
}


drift %>% ggplot(aes(driftSteps, mu)) +
  geom_point() +
  theme_classic()

這是作為答案發布的,因為它太長而無法放入評論中。
我認為參數估計沒有問題,它們非常適合數據,如下圖所示。
另請注意,零代表 19% 的數據,沒有它們,參數估計必須與數據生成過程中使用的不同。

# function to fit neg binomial to abundances of 
# species at the per-site level
nbpar <- function(ab){
  MASS::fitdistr(ab, densfun = "Negative Binomial", lower=c(1e-9, 1e-9))
}

# simulate an abundance vector
set.seed(100)
site_abundance <- rnbinom(667, size = 0.4, mu = 30)

# fit with zeros omitted
pars <- nbpar(site_abundance[site_abundance > 0])

mean(site_abundance == 0)
#> [1] 0.1904048

empiric_dens <- proportions(table(site_abundance[site_abundance > 0]))
barplot(empiric_dens)
curve(dnbinom(x, size = pars$estimate[1], mu = pars$estimate[2]), 
      from = 0, to = 300, col = "red", lwd = 2, add = TRUE)

reprex 包於 2022-06-17 創建 (v2.0.1)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM