I've spent 2 months wondering if this question is StackOverflow worthy, and I've concluded it is.
I'm volunteering on a team for a year to forecast a number of interesting things, a few months ago it was forecasting the probability of the number of earthquakes worldwide over mag 5 during the month of March. Really interesting problem. Thought I was reasonably good with R and then I hit this problem like a brick wall. It is a count problem, wanted to use Poisson distribution, but it won't work, mean and variance aren't equal. It is over dispersed.
The goal is to estimate the probability of:
<100 earthquakes 100-140 earthquakes 140-170 earthquakes 170-210
earthquakes 210 earthquakes
But I wrote some code here:
#(load data and libaries blah blah blah)
quakes_this_month<-10
days_left=31-1
days_left
month_left<- days_left/31
month_left
earthq5<- earthq4
earthq5$mag<-earthq5$mag*month_left
mu <- mean(earthq5$mag)
sigma <- sd(earthq5$mag)
paste("mean is ",mu, " and sigma is ", sigma)
pnorm((99-quakes_this_month) , mu, sigma, lower.tail = T)
lower.bound<- 100 -quakes_this_month
upper.bound<- 140.5-quakes_this_month
(pnorm(upper.bound, mu, sigma, lower.tail = T) - pnorm(lower.bound, mu, sigma))
lower.bound<- 140.5-quakes_this_month
upper.bound<- 170.5-quakes_this_month
(pnorm(upper.bound, mu, sigma) - pnorm(lower.bound, mu, sigma))
lower.bound<- 170.5-quakes_this_month
upper.bound<- 210.5-quakes_this_month
(pnorm(upper.bound, mu, sigma) - pnorm(lower.bound, mu, sigma))
(pnorm(210.5-quakes_this_month, mu, sigma, lower.tail = F))
So the idea here is as the month progresses and some earthquakes have happened, I can estimate the probability of hitting those limit thresholds. However, this isn't a Gaussian distribution, I can't use pnorm, I should use pnbinom(q, size, prob, mu, lower.tail = TRUE, log.p = FALSE)
but I don't know how to get the 'size' and 'prob' out of a count problem. This isn't taking 15 balls out of a jar 4 time. So I'm reaching out on this one, as it's been haunting me for weeks. How can I use pnbinom()
in place of pnorm()
given this is about earthquake counts per month?
So I found the answer, and for anyone else, here is how I did it. The data I was using was from USGS about earthquakes. There are quite a few other libraries I use in R. I think only MASS is needed for this example.
library(MASS)
quakeSim <- earthq4$count # this was my real data
quakeSim <- rnbinom(n = 12000, mu = 145, size =18) # you can use this for the example
quakeDistNB <- MASS::fitdistr(quakeSim, densfun = "negative binomial")
quakeDistPois <- MASS::fitdistr(quakeSim, densfun = "poisson")
quakeDistGaus<-MASS::fitdistr(quakeSim, densfun = "normal")
AIC(quakeDistNB)
AIC(quakeDistPois)
AIC(quakeDistGaus)
Quick check on Normalicy with shapiro test. (if Gaussian is lowest)
shapiro.test(earthq4$count)
Use the 5% rule. But it is NB, and not Gaussian so ignore all the CI tests below
summary(earthq4)
t.test(earthq4$count ) #default 0.95
So my data shows Negative Binomial distribution. Now lets look at it as a histogram with enough bins to show the shape of aa NB.
hist(quakeSim, breaks=80)
This part really drove me nuts until a friend shows me.
quakeDistNB <- MASS::fitdistr(earthq4$count , densfun = "negative binomial")
quakeDistNB
sizeHat <- quakeDistNB$estimate[1]
sizeHat
muHat <- quakeDistNB$estimate[2]
Then the probability of fewer than 100:
p100 <- pnbinom(q = 100, size = sizeHat, mu = muHat)
p100
probability of fewer than 150:
p150 <- pnbinom(q = 150, size = sizeHat, mu = muHat)
p150
probability of 100 to 150:
p150 - p100
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.