简体   繁体   中英

How to find to negative binomial probabilities in R

I've spent 2 months wondering if this question is StackOverflow worthy, and I've concluded it is.

I'm volunteering on a team for a year to forecast a number of interesting things, a few months ago it was forecasting the probability of the number of earthquakes worldwide over mag 5 during the month of March. Really interesting problem. Thought I was reasonably good with R and then I hit this problem like a brick wall. It is a count problem, wanted to use Poisson distribution, but it won't work, mean and variance aren't equal. It is over dispersed.

The goal is to estimate the probability of:

<100 earthquakes 100-140 earthquakes 140-170 earthquakes 170-210
earthquakes 210 earthquakes

But I wrote some code here:

#(load data and libaries blah blah blah)
quakes_this_month<-10
days_left=31-1
days_left
month_left<- days_left/31
month_left
earthq5<- earthq4
earthq5$mag<-earthq5$mag*month_left
mu <- mean(earthq5$mag)
sigma <- sd(earthq5$mag)
paste("mean is ",mu, " and sigma is ", sigma)
pnorm((99-quakes_this_month) , mu, sigma, lower.tail = T)
lower.bound<- 100 -quakes_this_month
upper.bound<- 140.5-quakes_this_month
(pnorm(upper.bound, mu, sigma, lower.tail = T) - pnorm(lower.bound, mu, sigma))
lower.bound<- 140.5-quakes_this_month
upper.bound<- 170.5-quakes_this_month
(pnorm(upper.bound, mu, sigma) - pnorm(lower.bound, mu, sigma))
lower.bound<- 170.5-quakes_this_month
upper.bound<- 210.5-quakes_this_month
(pnorm(upper.bound, mu, sigma) - pnorm(lower.bound, mu, sigma))
(pnorm(210.5-quakes_this_month, mu, sigma, lower.tail = F))

So the idea here is as the month progresses and some earthquakes have happened, I can estimate the probability of hitting those limit thresholds. However, this isn't a Gaussian distribution, I can't use pnorm, I should use pnbinom(q, size, prob, mu, lower.tail = TRUE, log.p = FALSE) but I don't know how to get the 'size' and 'prob' out of a count problem. This isn't taking 15 balls out of a jar 4 time. So I'm reaching out on this one, as it's been haunting me for weeks. How can I use pnbinom() in place of pnorm() given this is about earthquake counts per month?

So I found the answer, and for anyone else, here is how I did it. The data I was using was from USGS about earthquakes. There are quite a few other libraries I use in R. I think only MASS is needed for this example.

Load library and data

library(MASS)

quakeSim <-  earthq4$count  # this was my real data

quakeSim <-  rnbinom(n = 12000, mu = 145, size =18)  # you can use this for the example

Test for distribution fit checking 3 likely distributions, Gaussian, Poisson, and Negative Binomial

  quakeDistNB <- MASS::fitdistr(quakeSim, densfun = "negative binomial")
    quakeDistPois <- MASS::fitdistr(quakeSim, densfun = "poisson")
    quakeDistGaus<-MASS::fitdistr(quakeSim, densfun = "normal")

Compare Negative binomial, Poisson, and Guassian - lower AIC is better so pick the distribution with the lowest AIC.

 AIC(quakeDistNB)
    AIC(quakeDistPois)
    AIC(quakeDistGaus)

Quick check on Normalicy with shapiro test. (if Gaussian is lowest)

shapiro.test(earthq4$count) 

Use the 5% rule. But it is NB, and not Gaussian so ignore all the CI tests below

summary(earthq4)
t.test(earthq4$count ) #default 0.95

So my data shows Negative Binomial distribution. Now lets look at it as a histogram with enough bins to show the shape of aa NB.

visualize empirical distrib

hist(quakeSim, breaks=80)

Fit a negative binomial model and get the two critical values sizeHat and muHat from the output of the model 'quakeDistNB'

This part really drove me nuts until a friend shows me.

quakeDistNB <- MASS::fitdistr(earthq4$count , densfun = "negative binomial")
quakeDistNB
sizeHat <- quakeDistNB$estimate[1]
sizeHat
muHat <- quakeDistNB$estimate[2]

Now then, my problem was to predict the probability of less than 100 earthquakes and between 150 and 100 of greater than or equal to 5 magnitude.

Then the probability of fewer than 100:

p100 <- pnbinom(q = 100, size = sizeHat, mu = muHat)
p100

probability of fewer than 150:

p150 <- pnbinom(q = 150, size = sizeHat, mu = muHat)
p150

probability of 100 to 150:

p150 - p100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM