How to find to negative binomial probabilities in R

Question

I've spent 2 months wondering if this question is StackOverflow worthy, and I've concluded it is.

I'm volunteering on a team for a year to forecast a number of interesting things, a few months ago it was forecasting the probability of the number of earthquakes worldwide over mag 5 during the month of March. Really interesting problem. Thought I was reasonably good with R and then I hit this problem like a brick wall. It is a count problem, wanted to use Poisson distribution, but it won't work, mean and variance aren't equal. It is over dispersed.

The goal is to estimate the probability of:

<100 earthquakes 100-140 earthquakes 140-170 earthquakes 170-210
earthquakes 210 earthquakes

But I wrote some code here:

#(load data and libaries blah blah blah)
quakes_this_month<-10
days_left=31-1
days_left
month_left<- days_left/31
month_left
earthq5<- earthq4
earthq5$mag<-earthq5$mag*month_left
mu <- mean(earthq5$mag)
sigma <- sd(earthq5$mag)
paste("mean is ",mu, " and sigma is ", sigma)
pnorm((99-quakes_this_month) , mu, sigma, lower.tail = T)
lower.bound<- 100 -quakes_this_month
upper.bound<- 140.5-quakes_this_month
(pnorm(upper.bound, mu, sigma, lower.tail = T) - pnorm(lower.bound, mu, sigma))
lower.bound<- 140.5-quakes_this_month
upper.bound<- 170.5-quakes_this_month
(pnorm(upper.bound, mu, sigma) - pnorm(lower.bound, mu, sigma))
lower.bound<- 170.5-quakes_this_month
upper.bound<- 210.5-quakes_this_month
(pnorm(upper.bound, mu, sigma) - pnorm(lower.bound, mu, sigma))
(pnorm(210.5-quakes_this_month, mu, sigma, lower.tail = F))

So the idea here is as the month progresses and some earthquakes have happened, I can estimate the probability of hitting those limit thresholds. However, this isn't a Gaussian distribution, I can't use pnorm, I should use pnbinom(q, size, prob, mu, lower.tail = TRUE, log.p = FALSE) but I don't know how to get the 'size' and 'prob' out of a count problem. This isn't taking 15 balls out of a jar 4 time. So I'm reaching out on this one, as it's been haunting me for weeks. How can I use pnbinom() in place of pnorm() given this is about earthquake counts per month?

Answer 1

So I found the answer, and for anyone else, here is how I did it. The data I was using was from USGS about earthquakes. There are quite a few other libraries I use in R. I think only MASS is needed for this example.

Load library and data

library(MASS)

quakeSim <-  earthq4$count  # this was my real data

quakeSim <-  rnbinom(n = 12000, mu = 145, size =18)  # you can use this for the example

Test for distribution fit checking 3 likely distributions, Gaussian, Poisson, and Negative Binomial

  quakeDistNB <- MASS::fitdistr(quakeSim, densfun = "negative binomial")
    quakeDistPois <- MASS::fitdistr(quakeSim, densfun = "poisson")
    quakeDistGaus<-MASS::fitdistr(quakeSim, densfun = "normal")

Compare Negative binomial, Poisson, and Guassian - lower AIC is better so pick the distribution with the lowest AIC.

 AIC(quakeDistNB)
    AIC(quakeDistPois)
    AIC(quakeDistGaus)

Quick check on Normalicy with shapiro test. (if Gaussian is lowest)

shapiro.test(earthq4$count)

Use the 5% rule. But it is NB, and not Gaussian so ignore all the CI tests below

summary(earthq4)
t.test(earthq4$count ) #default 0.95

So my data shows Negative Binomial distribution. Now lets look at it as a histogram with enough bins to show the shape of aa NB.

visualize empirical distrib

hist(quakeSim, breaks=80)

Fit a negative binomial model and get the two critical values sizeHat and muHat from the output of the model 'quakeDistNB'

This part really drove me nuts until a friend shows me.

quakeDistNB <- MASS::fitdistr(earthq4$count , densfun = "negative binomial")
quakeDistNB
sizeHat <- quakeDistNB$estimate[1]
sizeHat
muHat <- quakeDistNB$estimate[2]

Now then, my problem was to predict the probability of less than 100 earthquakes and between 150 and 100 of greater than or equal to 5 magnitude.

Then the probability of fewer than 100:

p100 <- pnbinom(q = 100, size = sizeHat, mu = muHat)
p100

probability of fewer than 150:

p150 <- pnbinom(q = 150, size = sizeHat, mu = muHat)
p150

probability of 100 to 150:

p150 - p100

How to find to negative binomial probabilities in R

Question

1 answers

solution1
0 2018-10-07 18:17:41

Load library and data

Test for distribution fit checking 3 likely distributions, Gaussian, Poisson, and Negative Binomial

Compare Negative binomial, Poisson, and Guassian - lower AIC is better so pick the distribution with the lowest AIC.

visualize empirical distrib

Fit a negative binomial model and get the two critical values sizeHat and muHat from the output of the model 'quakeDistNB'

Now then, my problem was to predict the probability of less than 100 earthquakes and between 150 and 100 of greater than or equal to 5 magnitude.

How to find to negative binomial probabilities in R

Question

1 answers

solution1 0 2018-10-07 18:17:41

Load library and data

Test for distribution fit checking 3 likely distributions, Gaussian, Poisson, and Negative Binomial

Compare Negative binomial, Poisson, and Guassian - lower AIC is better so pick the distribution with the lowest AIC.

visualize empirical distrib

Fit a negative binomial model and get the two critical values sizeHat and muHat from the output of the model 'quakeDistNB'

Now then, my problem was to predict the probability of less than 100 earthquakes and between 150 and 100 of greater than or equal to 5 magnitude.

solution1
0 2018-10-07 18:17:41